Machine learning and natural language processing for code quality analysis in introductory programming courses

Placeholder Show Content

Abstract/Contents

Abstract
The employment rate of software developers has risen significantly over the last 30 years. As a result, more students are considering computer science as a potential career path. Over the last 15 years, introductory programming course (CS1) enrollment has been increasing at a much faster rate than the increase in the number of CS faculty, with no apparent signs of slowing. Thus, a scalability issue clearly exists. Technology has opened up learning opportunities to a wide audience. Millions of people use Massive Open Online Course (MOOC) providers, while hundreds of thousands enroll in online CS courses and coding boot camps. Automated assessment helps instructors to maintain pace with the overwhelming workload. Moreover, emphasis is mostly placed on functionality. However, software development is much more than just writing working programs. Code quality assessment remains a manual process. This dissertation focuses on critical CS1 qualitative aspects as well as how to use technology to take on time-consuming human tasks. First, I cover one of the fundamental code quality standards, readability, and focus on its corner- stone in CS1 programs, namely function names. An identifier that captures the intended task with clarity makes code readable and self-documenting. I present and examine a semi-automated software system that I built to improve function names. It uses a variation of the Na ̈ıve Bayes classifier to assess the quality of the identifiers and then suggests alternatives for the poor ones. Second, I study the relationship between problem-related entities and functional decomposition. I introduce a method for quantifying how broad a student's view of the problem is by the time they jump into coding. I proceed with software implementation and explain how I used natural language processing (NLP) to detect problem-related entities, which is a key stage in this process. Finally, I use the system to classify students at scale and determine how the broadness of the problem's view affects learners' performance, the time required to solve a programming challenge, and the complexity of the solution. Third, I introduce a systematic approach to detecting when novice programmers decompose their code and identify what drives their decision. I detail a software system that I built to implement these tasks. Next, I use the system to classify students and explore their relationship with program complexity and student performance. Lastly, I introduce an alternative to the standard testing approaches for functionality validation. My solution depends on code instrumentation. Its main advantages are that it takes substantially less code and can test programs with nondeterministic behavior or user input. Then, I present Delve, an educational tool that I created for instructors and teaching assistants in introductory programming courses. Delve integrates many ideas from my previous systems and bundles them into an easy-to-use graphical user interface.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Charitsis, Charalampos-S
Degree supervisor Mitchell, John C
Thesis advisor Mitchell, John C
Thesis advisor Boneh, Dan, 1969-
Thesis advisor Piech, Chris (Christopher)
Degree committee member Boneh, Dan, 1969-
Degree committee member Piech, Chris (Christopher)
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Charis Charitsis.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/zm348cp2639

Access conditions

Copyright
© 2023 by Charalampos-S Charitsis
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...