Machine learning and natural language processing for code quality analysis in introductory programming courses
Abstract/Contents
- Abstract
- The employment rate of software developers has risen significantly over the last 30 years. As a result, more students are considering computer science as a potential career path. Over the last 15 years, introductory programming course (CS1) enrollment has been increasing at a much faster rate than the increase in the number of CS faculty, with no apparent signs of slowing. Thus, a scalability issue clearly exists. Technology has opened up learning opportunities to a wide audience. Millions of people use Massive Open Online Course (MOOC) providers, while hundreds of thousands enroll in online CS courses and coding boot camps. Automated assessment helps instructors to maintain pace with the overwhelming workload. Moreover, emphasis is mostly placed on functionality. However, software development is much more than just writing working programs. Code quality assessment remains a manual process. This dissertation focuses on critical CS1 qualitative aspects as well as how to use technology to take on time-consuming human tasks. First, I cover one of the fundamental code quality standards, readability, and focus on its corner- stone in CS1 programs, namely function names. An identifier that captures the intended task with clarity makes code readable and self-documenting. I present and examine a semi-automated software system that I built to improve function names. It uses a variation of the Na ̈ıve Bayes classifier to assess the quality of the identifiers and then suggests alternatives for the poor ones. Second, I study the relationship between problem-related entities and functional decomposition. I introduce a method for quantifying how broad a student's view of the problem is by the time they jump into coding. I proceed with software implementation and explain how I used natural language processing (NLP) to detect problem-related entities, which is a key stage in this process. Finally, I use the system to classify students at scale and determine how the broadness of the problem's view affects learners' performance, the time required to solve a programming challenge, and the complexity of the solution. Third, I introduce a systematic approach to detecting when novice programmers decompose their code and identify what drives their decision. I detail a software system that I built to implement these tasks. Next, I use the system to classify students and explore their relationship with program complexity and student performance. Lastly, I introduce an alternative to the standard testing approaches for functionality validation. My solution depends on code instrumentation. Its main advantages are that it takes substantially less code and can test programs with nondeterministic behavior or user input. Then, I present Delve, an educational tool that I created for instructors and teaching assistants in introductory programming courses. Delve integrates many ideas from my previous systems and bundles them into an easy-to-use graphical user interface.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Charitsis, Charalampos-S |
---|---|
Degree supervisor | Mitchell, John C |
Thesis advisor | Mitchell, John C |
Thesis advisor | Boneh, Dan, 1969- |
Thesis advisor | Piech, Chris (Christopher) |
Degree committee member | Boneh, Dan, 1969- |
Degree committee member | Piech, Chris (Christopher) |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Charis Charitsis. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/zm348cp2639 |
Access conditions
- Copyright
- © 2023 by Charalampos-S Charitsis
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...