Foundation models for robust machine learning

Kumar, Ananya

Foundation models for robust machine learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgt661gq6831" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Machine learning systems are not robust to distribution shifts---they suffer large drops in accuracy when deployed in different environments from what they were trained on. For example when satellite remote sensing models are deployed in new countries, tumor detection models are deployed in new hospitals, or wildlife conservation models are deployed in new forests, they face large drops in accuracy. In this thesis, we show that the foundation model paradigm is a principled solution that leads to state-of-the-art robustness. The foundation model paradigm consists of three steps: pretraining a model on diverse unlabeled data (e.g., satellite images from around the world) to learn general-purpose representations, adapting these models to downstream tasks that we care about, and then deploying these models in the real world. This thesis will focus on understanding and improving each of these steps for robustness. (1) First, we show that pretraining on unlabeled data learns transferable representations that improves accuracy even on domains where we had no labels. We explain why pretraining can work in a very different way from some classical intuitions of collapsing representations (domain invariance). Our theory predicts phenomena on real datasets, and leads to improved pretraining methods. (2) Next, we will show that the standard approach of adaptation (updating all the model's parameters) can distort pretrained representations and perform poorly out-of-distribution. Our theoretical analysis leads to better methods for adaptation and state-of-the-art accuracies on ImageNet and in applications such as satellite remote sensing, wildlife conservation, and radiology. (3) Finally, when we deploy models in the real world, the data distribution evolves over time which leads to a drop in model performance. We show that self-training on a model's own predictions can improve robustness to distribution shift, and explain when and why self-training works.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Kumar, Ananya
Degree supervisor	Liang, Percy
Degree supervisor	Ma, Tengyu
Thesis advisor	Liang, Percy
Thesis advisor	Ma, Tengyu
Thesis advisor	Finn, Chelsea
Degree committee member	Finn, Chelsea
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ananya Kumar.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/gt661gq6831

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...