Robust, data-efficient, and trustworthy medical AI

Placeholder Show Content

Abstract/Contents

Abstract
Artificial intelligence (AI) has revolutionized multiple fields including safety-critical domains such as healthcare. It has shown remarkable potential for building both diagnostic and predictive models in medicine using various types of healthcare data. However, despite its potential, there are two major barriers to medical AI development and subsequent adoption to healthcare systems: 1) Training AI models that perform well with a limited amount of labeled data is challenging. However, curating large labeled datasets is costly; and might not be possible in several cases; 2) Even well-trained, state-of-the-art models, with impressive accuracies on their test sets and developed with rigorous validation and testing, may fail to generalize to new patients when deployed and they may be brittle under distribution shifts. This reduces trust in model capabilities and limits their adoption into clinical practice. In this thesis, I address the above barriers to medical AI development and deployment, namely robustness, data-efficiency and model trust, by presenting three different works that improve upon the current state-of-the-art for various modalities. In the first part of my thesis, I present observational supervision, a novel supervision paradigm wherein we use passively collected, auxiliary metadata to train AI models. I use observational supervision to tackle the major challenge of training robust, high performing models with limited training data for clinical outcome prediction. Clinical outcome prediction models can improve medical care and aid in clinical decision making, but they are typically presented with limited training data, resulting in models with narrow capabilities and reduced generalization. Audit logs are an often underutilized, passively collected data source in electronic health record (EHR) systems that capture the interactions of clinicians with the EHR and represent observational signals. Our proposed method of leveraging observational supervision for structured electronic health records using audit logs in conjunction with clinical data improves both performance and robustness of AI models trained to predict clinical outcomes in two clinically important diseases (acute kidney injury and acute ischemic stroke), even with limited labeled training data. In the second part of my thesis, I propose domain-specific augmentation strategies for self-supervised foundation models that enable large scale, label-efficient training of AI models. I tackle the major challenge of model robustness and label-efficiency. The foundation model paradigm involves pretraining models using large quantities of data in a self-supervised manner and then adapting the pretrained model to different downstream tasks. Foundation models provide an opportunity for improving model robustness in a label-efficient fashion. Augmentations or transformations of the input are key to the success of foundation models; however, medical images are very different from natural images and need specialized augmentation strategies. Our proposed augmentation strategies for medical images result in a domain-specific foundation model that improves performance over data-hungry, fully supervised models for chest X-ray classification, and generalizes to both unseen populations and out-of-distribution data with limited labels. In the third part of my thesis, I present TRUST-LAPSE, an explainable, post-hoc and actionable trust-scoring framework for continuous AI model monitoring. I tackle the major challenge of model trust. AI models, despite their success on test sets, require label-free, continuous model monitoring that can quantify trust in their predictions to ensure safe and reliable deployment. Techniques such as classical uncertainty estimation, confidence calibration and Bayesian networks are currently employed for that purpose, and suffer from several limitations. Our proposed trust-scoring framework, TRUST-LAPSE, overcomes these limitations and can determine when a deployed model's prediction can and cannot be trusted with high accuracy (state-of-the-art performance), can identify when the model encounters classes unseen during training or a change in data distribution, and can accommodate various types of incoming data (vision, audio and clinical EEG). Together, these works pave the way for developing and deploying robust, data-efficient and trustworthy medical AI models to improve clinical care.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Bhaskhar, Nandita
Degree supervisor Rubin, Daniel
Thesis advisor Rubin, Daniel
Thesis advisor Nishimura, Dwight
Thesis advisor Pauly, John
Degree committee member Nishimura, Dwight
Degree committee member Pauly, John
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Nandita Bhaskhar.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/yt844dq7017

Access conditions

Copyright
© 2023 by Nandita Bhaskhar
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...