Principled approaches to modern machine learning challenges

Placeholder Show Content

Abstract/Contents

Abstract
Learning based models have been successfully deployed in various sciences and engineering applications. Despite the empirical successes, there is little understanding about the workings and the failure modes of these models. In this thesis, we focus on the following questions: a) when and why do these models generalize beyond the set of training data, and b) what conclusions can one reliably make when the data available is insufficient to learn accurate models. In the first part, we focus on provable generalization guarantees. We take the first steps towards understanding why deep neural networks generalize to unseen data despite having a larger number of parameters than the available data points to learn from. We characterize the ``simple" behaviour of the functions learned by neural networks in certain settings, which in turn can be shown to generalize well on unseen data. We next design algorithms for provably learning decision trees inspired from the top-down heuristics commonly used in practice. The key idea is to use a small subset of features to determine the splitting criterion at each node of the decision tree, instead of a single one as is currently used by practitioners. Our proposed algorithm is both sample and time efficient, and allows us to provide generalization guarantees for decision functions which are well represented by small decision trees. In the second part, we focus on what conclusions can we make when do not have sufficient labeled data. Learning methods, especially deep neural networks, require large amounts of labeled training data to learn meaningful predictive models. In many domains, such labeled data comes as a result of performing extensive experiments, thus restricting the amount of data one can receive. Given such limited data, one is often interested in knowing what useful conclusions can we reliably make when the data is insufficient to learn any reliable predictor. For example, if we are given the test points in advance, is it easier to predict the labels of those given test points rather than learning the entire model? In this part of the thesis, we show that given a particular test point along with a batch of unlabeled data set, one needs much fewer active label queries (independent of the complexity of the function class) to make a good prediction on this test point as compared to learning the complete function for several hypothesis classes.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Gupta, Neha, (Computer scientist)
Degree supervisor Charikar, Moses
Degree supervisor Valiant, Gregory
Thesis advisor Charikar, Moses
Thesis advisor Valiant, Gregory
Thesis advisor Tan, Li-Yang
Degree committee member Tan, Li-Yang
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Neha Gupta.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/yd023pm9683

Access conditions

Copyright
© 2022 by Neha Gupta
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...