Towards cost-effective and trustworthy healthcare machine learning
Abstract/Contents
- Abstract
- Machine learning (ML) has made exciting progress across many healthcare tasks, such as chest X-Ray interpretation and seizure detection from electroencephalograms (EEGs). However, while ML models exhibit impressive overall performance, we identify two critical roadblocks to model development and deployment: (1) reliance on massive training datasets created via manual expert labeling, resulting in high annotation costs, and (2) reliance on non-generalizable features, resulting in unexpected poor performance on subgroups of patients ("hidden stratification"). As a result, healthcare ML is currently costly to develop and can be untrustworthy to deploy. To address the first roadblock, we develop new forms of cost-effective supervised training of ML models with cross-modal data programming (XMDP). In particular, healthcare data are often accompanied by other data modalities that contain task-related information for which it is feasible to create labeling functions at low cost, such as clinical reports, workflow notes, or (in the future) gaze data (i.e., eye-tracking data). We develop labeling functions that map the auxiliary data modalities (either text or gaze) to labels using only a small amount of manual labels. Once we develop these labeling functions, we are then able to scale labeled training sets without additional annotation costs. To address the second roadblock, we propose to improve robustness of ML models to hidden stratification by increasing task specificity in two exemplar medical tasks, pneumothorax detection in medical imaging and seizure detection in electroencephalogram (EEG) time series data. For medical imaging, as opposed to training a binary classification model for detecting pneumothorax (low task specificity), we first train an image segmentation model to localize pneumothorax (higher task specificity) and use the segmentation output to derive the binary prediction. For detection of seizure on EEG, we increase task specificity by training a model to classify additional attributes in the EEG, such as artifacts. We find that increasing task specificity significantly reduces reliance on non-generalizable features and improves performance among clinically meaningful subgroups, which decreases performance gaps among subgroups and results in more trustworthy models. In summary, our work investigated new forms of model supervision that are less costly than existing approaches, and uncovered strong connections between task specificity and model robustness. While our experiments focus on chest X-ray classification and EEG seizure detection, our proposed methods are applicable to a wider range of healthcare applications.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Saab, Khaled Kamal |
---|---|
Degree supervisor | Re, Chris |
Degree supervisor | Rubin, Daniel |
Thesis advisor | Re, Chris |
Thesis advisor | Rubin, Daniel |
Thesis advisor | Lee-Messer, Christopher |
Thesis advisor | Pauly, John |
Thesis advisor | Pilanci, Mert |
Degree committee member | Lee-Messer, Christopher |
Degree committee member | Pauly, John |
Degree committee member | Pilanci, Mert |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Khaled K. Saab. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/wr751rz0386 |
Access conditions
- Copyright
- © 2023 by Khaled Kamal Saab
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...