Robustness in machine learning and optimization, with limited structural knowledge

Sohoni, Nimit Sharad

Robustness in machine learning and optimization, with limited structural knowledge

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fvm168rv6853" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In this dissertation, we develop and analyze algorithms for robustness in three different machine learning settings. In the first part of the dissertation, we introduce the problem of hidden stratification — which is when a classification model substantially underperforms on certain unlabeled subclasses of the data — and propose a method to detect and mitigate this issue. Previous works studied how to address this in the setting where the subclass labels are known. Based on the empirical observation that unlabeled subclasses are often separable in the feature space of deep neural networks, we instead estimate subclass labels for the data using clustering techniques. We then use the estimated subclass labels as a form of noisy supervision in a distributionally robust optimization objective, in order to train a model that is more robust to inter-subclass variations. We demonstrate the effectiveness of our approach on several robust image classification benchmarks. We briefly discuss alternative methods for 1) utilizing a limited number of subclass labels to further improve performance, and 2) using contrastive learning to learn representations less susceptible to hidden stratification. In the second part of the dissertation, we study the problem of evaluating classification models under structured distribution shifts. Given a labeled sample from a "source" distribution and an unlabeled sample from the "target" distribution, importance weighting is the standard approach to perform such evaluations; however, importance weighting can struggle in high-dimensional settings, and fails when the support of the target distribution is not contained in that of the source. We show that one can sidestep these issues with some foreknowledge of the nature of the distribution shift; specifically, we present an algorithm that uses user-defined "slicing functions" — binary functions intended to capture possible axes of distribution shift — to estimate performance on the target distribution. We theoretically characterize the robustness of our approach to noise and incompleteness in the slicing functions, and empirically verify its effectiveness on a variety of classification tasks. In the third part of the dissertation, we develop an accelerated gradient method to efficiently minimize a class of smooth structured nonconvex functions which we term "quasar-convex" functions. Our algorithm is a generalization of the classic accelerated gradient descent method for convex functions, and is robust to possible nonconvexity between algorithm iterates. We provide upper and lower bounds on the number of first-order evaluations that our algorithm requires to find an approximate optimum, which show that our algorithm has optimal complexity up to logarithmic factors.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Sohoni, Nimit Sharad
Degree supervisor	Re, Chris
Thesis advisor	Re, Chris
Thesis advisor	Hashimoto, Tatsunori
Thesis advisor	Sidford, Aaron
Degree committee member	Hashimoto, Tatsunori
Degree committee member	Sidford, Aaron
Associated with	Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Nimit Sohoni.
Note	Submitted to the Instite for Computational and Mathematical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/vm168rv6853

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...