Robustness in machine learning and optimization, with limited structural knowledge

Placeholder Show Content

Abstract/Contents

Abstract
In this dissertation, we develop and analyze algorithms for robustness in three different machine learning settings. In the first part of the dissertation, we introduce the problem of hidden stratification — which is when a classification model substantially underperforms on certain unlabeled subclasses of the data — and propose a method to detect and mitigate this issue. Previous works studied how to address this in the setting where the subclass labels are known. Based on the empirical observation that unlabeled subclasses are often separable in the feature space of deep neural networks, we instead estimate subclass labels for the data using clustering techniques. We then use the estimated subclass labels as a form of noisy supervision in a distributionally robust optimization objective, in order to train a model that is more robust to inter-subclass variations. We demonstrate the effectiveness of our approach on several robust image classification benchmarks. We briefly discuss alternative methods for 1) utilizing a limited number of subclass labels to further improve performance, and 2) using contrastive learning to learn representations less susceptible to hidden stratification. In the second part of the dissertation, we study the problem of evaluating classification models under structured distribution shifts. Given a labeled sample from a "source" distribution and an unlabeled sample from the "target" distribution, importance weighting is the standard approach to perform such evaluations; however, importance weighting can struggle in high-dimensional settings, and fails when the support of the target distribution is not contained in that of the source. We show that one can sidestep these issues with some foreknowledge of the nature of the distribution shift; specifically, we present an algorithm that uses user-defined "slicing functions" — binary functions intended to capture possible axes of distribution shift — to estimate performance on the target distribution. We theoretically characterize the robustness of our approach to noise and incompleteness in the slicing functions, and empirically verify its effectiveness on a variety of classification tasks. In the third part of the dissertation, we develop an accelerated gradient method to efficiently minimize a class of smooth structured nonconvex functions which we term "quasar-convex" functions. Our algorithm is a generalization of the classic accelerated gradient descent method for convex functions, and is robust to possible nonconvexity between algorithm iterates. We provide upper and lower bounds on the number of first-order evaluations that our algorithm requires to find an approximate optimum, which show that our algorithm has optimal complexity up to logarithmic factors.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Sohoni, Nimit Sharad
Degree supervisor Re, Chris
Thesis advisor Re, Chris
Thesis advisor Hashimoto, Tatsunori
Thesis advisor Sidford, Aaron
Degree committee member Hashimoto, Tatsunori
Degree committee member Sidford, Aaron
Associated with Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Nimit Sohoni.
Note Submitted to the Instite for Computational and Mathematical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/vm168rv6853

Access conditions

Copyright
© 2022 by Nimit Sharad Sohoni
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...