Principled approaches to modern machine learning challenges
- Learning based models have been successfully deployed in various sciences and engineering applications. Despite the empirical successes, there is little understanding about the workings and the failure modes of these models. In this thesis, we focus on the following questions: a) when and why do these models generalize beyond the set of training data, and b) what conclusions can one reliably make when the data available is insufficient to learn accurate models. In the first part, we focus on provable generalization guarantees. We take the first steps towards understanding why deep neural networks generalize to unseen data despite having a larger number of parameters than the available data points to learn from. We characterize the ``simple" behaviour of the functions learned by neural networks in certain settings, which in turn can be shown to generalize well on unseen data. We next design algorithms for provably learning decision trees inspired from the top-down heuristics commonly used in practice. The key idea is to use a small subset of features to determine the splitting criterion at each node of the decision tree, instead of a single one as is currently used by practitioners. Our proposed algorithm is both sample and time efficient, and allows us to provide generalization guarantees for decision functions which are well represented by small decision trees. In the second part, we focus on what conclusions can we make when do not have sufficient labeled data. Learning methods, especially deep neural networks, require large amounts of labeled training data to learn meaningful predictive models. In many domains, such labeled data comes as a result of performing extensive experiments, thus restricting the amount of data one can receive. Given such limited data, one is often interested in knowing what useful conclusions can we reliably make when the data is insufficient to learn any reliable predictor. For example, if we are given the test points in advance, is it easier to predict the labels of those given test points rather than learning the entire model? In this part of the thesis, we show that given a particular test point along with a batch of unlabeled data set, one needs much fewer active label queries (independent of the complexity of the function class) to make a good prediction on this test point as compared to learning the complete function for several hypothesis classes.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Gupta, Neha, (Computer scientist)
|Degree committee member
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2022.
- © 2022 by Neha Gupta
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...