Analyzing Parole Hearings with Natural Language Processing

Placeholder Show Content


The parole process is a vital component of America's criminal justice system which allows eligible candidates to be released from potentially indeterminate life sentences. In the state of California, thousands of parole hearings are conducted each year. Despite this immense volume, however, studies of the efficacy and fairness of the parole process are largely limited to close analyses of small selections of hearings. Recent advances in natural language processing have made it feasible to automatically examine vast quantities of text, but these methods are largely untested on the domain of parole hearings. On a novel dataset of roughly 35,000 parole hearings from the state of California, we perform linguistic experiments utilizing both curated lexicons and model-based predictions to characterize the language used in hearings when conditioning on a candidate's race / ethnicity and their type of legal representation. We find that these factors predict striking differences in language use by both parole commissioners and attorneys, and our results both confirm earlier studies and indicate new directions for future research. Most notably, we find across a variety of measures that privately retained attorneys are both more active in parole hearing and more likely to make use of the relevant legal precedents than their government-appointed counterparts. Our results, broadly speaking, demonstrate the promise of unsupervised methods when applied carefully to high-impact domains. However, we also note cases in which expert guidance is necessary to draw conclusions from unsupervised experiments. We argue that a combination of both approaches will be necessary to fully unlock the promise of data-driven methods.


Type of resource text
Date created June 4, 2021


Author Todd, Graham
Degree granting institution Stanford University, Department of Symbolic Systems
Primary advisor Potts, Christopher
Advisor Jurafsky, Dan


Subject Parole
Subject NLP
Subject Word Vectors
Subject Lexicons
Subject Legal NLP
Subject Symbolic Systems
Subject Stanford University
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Preferred citation

Preferred Citation
Todd, Graham. (2021). Analyzing Parole Hearings with Natural Language Processing. Stanford Digital Repository. Available at:


Undergraduate Honors Theses, Symbolic Systems Program, Stanford University

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...