Towards cost-effective and trustworthy healthcare machine learning

Saab, Khaled Kamal

Towards cost-effective and trustworthy healthcare machine learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fwr751rz0386" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Machine learning (ML) has made exciting progress across many healthcare tasks, such as chest X-Ray interpretation and seizure detection from electroencephalograms (EEGs). However, while ML models exhibit impressive overall performance, we identify two critical roadblocks to model development and deployment: (1) reliance on massive training datasets created via manual expert labeling, resulting in high annotation costs, and (2) reliance on non-generalizable features, resulting in unexpected poor performance on subgroups of patients ("hidden stratification"). As a result, healthcare ML is currently costly to develop and can be untrustworthy to deploy. To address the first roadblock, we develop new forms of cost-effective supervised training of ML models with cross-modal data programming (XMDP). In particular, healthcare data are often accompanied by other data modalities that contain task-related information for which it is feasible to create labeling functions at low cost, such as clinical reports, workflow notes, or (in the future) gaze data (i.e., eye-tracking data). We develop labeling functions that map the auxiliary data modalities (either text or gaze) to labels using only a small amount of manual labels. Once we develop these labeling functions, we are then able to scale labeled training sets without additional annotation costs. To address the second roadblock, we propose to improve robustness of ML models to hidden stratification by increasing task specificity in two exemplar medical tasks, pneumothorax detection in medical imaging and seizure detection in electroencephalogram (EEG) time series data. For medical imaging, as opposed to training a binary classification model for detecting pneumothorax (low task specificity), we first train an image segmentation model to localize pneumothorax (higher task specificity) and use the segmentation output to derive the binary prediction. For detection of seizure on EEG, we increase task specificity by training a model to classify additional attributes in the EEG, such as artifacts. We find that increasing task specificity significantly reduces reliance on non-generalizable features and improves performance among clinically meaningful subgroups, which decreases performance gaps among subgroups and results in more trustworthy models. In summary, our work investigated new forms of model supervision that are less costly than existing approaches, and uncovered strong connections between task specificity and model robustness. While our experiments focus on chest X-ray classification and EEG seizure detection, our proposed methods are applicable to a wider range of healthcare applications.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Saab, Khaled Kamal
Degree supervisor	Re, Chris
Degree supervisor	Rubin, Daniel
Thesis advisor	Re, Chris
Thesis advisor	Rubin, Daniel
Thesis advisor	Lee-Messer, Christopher
Thesis advisor	Pauly, John
Thesis advisor	Pilanci, Mert
Degree committee member	Lee-Messer, Christopher
Degree committee member	Pauly, John
Degree committee member	Pilanci, Mert
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Khaled K. Saab.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/wr751rz0386

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...