The digital patient : machine learning techniques for analyzing electronic health record data

Placeholder Show Content

Abstract/Contents

Abstract
The current unprecedented rate of digitization of longitudinal health data --- continuous device monitoring data, laboratory measurements, medication orders, treatment reports, reports of physician assessments --- allows visibility into patient health at increasing levels of detail. A clearer lens into this data could help improve decision making both for individual physicians on the front lines of care, and for policy makers setting national direction. However, this type of data is high-dimensional (an infant with no prior clinical history can have more than 1000 different measurements in the ICU), highly unstructured (the measurements occur irregularly, and different numbers and types of measurements are taken for different patients) and heterogeneous (from ultrasound assessments to lab tests to continuous monitor data). Furthermore, the data is often sparse, systematically not present, and the underlying system is non-stationary. Extracting the full value of the existing data requires novel approaches. In this thesis, we develop novel methods to show how longitudinal health data contained in Electronic Health Records (EHRs) can be harnessed for making novel clinical discoveries. For this, one requires access to patient outcome data --- which patient has which complications. We present a method for automated extraction of patient outcomes from EHR data; our method shows how natural languages cues from the physicians notes can be combined with clinical events that occur during a patient's length of stay in the hospital to extract significantly higher quality annotations than previous state-of-the-art systems. We develop novel methods for exploratory analysis and structure discovery in bedside monitor data. This data forms the bulk of the data collected on any patient yet, it is not utilized in any substantive way post collection. We present methods to discover recurring shape and dynamic signatures in this data. While we primarily focus on clinical time series, our methods also generalize to other continuous-valued time series data. Our analysis of the bedside monitor data led us to a novel use of this data for risk prediction in infants. Using features automatically extracted from physiologic signals collected in the first 3 hours of life, we develop Physiscore, a tool that predicts infants at risk for major complications downstream. Physiscore is both fully automated and significantly more accurate than the current standard of care. It can be used for resource optimization within a NICU, managing infant transport to a higher level of care and parental counseling. Overall, this thesis illustrates how the use of machine learning for analyzing these large scale digital patient data repositories can yield new clinical discoveries and potentially useful tools for improving patient care.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Saria, Suchi
Associated with Stanford University, Computer Science Department
Primary advisor Koller, Daphne
Thesis advisor Koller, Daphne
Thesis advisor Penn, Anna Asher
Thesis advisor Thrun, Sebastian, 1967-
Advisor Penn, Anna Asher
Advisor Thrun, Sebastian, 1967-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Suchi Saria.
Note Submitted to the Department of Computer Science.
Thesis Ph.D. Stanford University 2011
Location electronic resource

Access conditions

Copyright
© 2011 by Suchi Saria
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...