Insights from patient authored text : from close reading to automated extraction

Placeholder Show Content

Abstract/Contents

Abstract
Millions of people collaborate online with others who share their health concerns. In the process, these users perform complex health-related tasks, such as differential diagnosis and treatment comparison. The result is a massive, growing and readily accessible corpus of patient authored text (PAT) that documents patients' behavior outside of the clinical environment. As a result, PAT can provide insights into otherwise obscure topics, such as why patients follow only certain parts of a treatment protocol, or how people self-treat stigmatized conditions such as prescription drug addiction. Despite the potential value of PAT, attempts to extract medically-relevant insights from it have been limited. PAT is notoriously noisy and challenging to work with, and there is a dearth of methods and tools for processing and analyzing it. Moreover, the specific research questions that PAT can support are not obvious: determining what data PAT encodes, and how, is a challenge in and of itself. In this thesis, I develop methods for automatically extracting medically-relevant data from PAT. I focus specifically on the topic of addiction: a stigmatized and prevalent medical condition. Building on close readings of source text to inform schema induction, data annotation, and feature engineering, I train classifiers that accurately identify (1) medically-relevant terms in PAT; (2) users' motivations for participating in an addiction-related online health community; (3) users' drugs of choice, and (4) users' transitions through relapse and recovery. Using these classifiers to scale analyses to large PAT corpora, I derive novel insights into the process of addiction, as well as the role that online health communities play in giving users informational and emotional support and, ultimately, in enabling recovery. In concert, these contributions both underscore PAT's latent value for illuminating poorly understood or clandestine medical topics, and offer viable methods that dramatically improve our ability to realize this value.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with MacLean, Diana Lynn
Associated with Stanford University, Department of Computer Science.
Primary advisor Heer, Jeffrey Michael
Thesis advisor Heer, Jeffrey Michael
Thesis advisor Bernstein, Michael
Thesis advisor Card, Stuart K
Thesis advisor Manning, Christopher D
Advisor Bernstein, Michael
Advisor Card, Stuart K
Advisor Manning, Christopher D

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Diana Lynn MacLean.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Diana Lynn MacLean
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...