Statistical learning under resource constraints
Abstract/Contents
- Abstract
- Statistical learning algorithms are an increasingly prevalent component of modern software systems. As such, the design of learning algorithms themselves must take into account the constraints imposed by resource-constrained applications. This dissertation explores resource-constrained learning from two distinct perspectives: learning with limited memory and learning with limited labeled data. In Part I, we consider the challenge of learning with limited memory, a constraint that frequently arises in the context of learning on mobile or embedded devices. First, we describe a randomized sketching algorithm that learns a linear classifier in a compressed, space-efficient form---i.e., by storing far fewer parameters than the dimension of the input features. Unlike typical feature hashing approaches, our method allows for the efficient recovery of the largest magnitude weights in the learned classifier, thus facilitating model interpretation and enabling several memory-efficient stream processing applications. Next, we shift our focus to unsupervised learning, where we study low-rank matrix and tensor factorization on compressed data. In this setting, we establish conditions under which a factorization computed on compressed data can be used to provably recover factors in the original, high-dimensional space. In Part II, we study the statistical constraint of learning with limited labeled data. We first present Equivariant Transformer layers, a family of differentiable image-to-image mappings that improve sample efficiency by directly incorporating prior knowledge on transformation invariances into their architecture. We then discuss a self-training algorithm for semi-supervised learning, where a small number of labeled examples is supplemented by a large collection of unlabeled data. Our method reinterprets the semi-supervised label assignment process as an optimal transportation problem between examples and classes, the solution to which can be efficiently approximated via Sinkhorn iteration. This formulation subsumes several commonly used label assignment heuristics within a single principled optimization framework.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Tai, Kai Sheng |
---|---|
Degree supervisor | Valiant, Gregory |
Thesis advisor | Valiant, Gregory |
Thesis advisor | Zaharia, Matei |
Degree committee member | Zaharia, Matei |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Kai Sheng Tai. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/mp952br3417 |
Access conditions
- Copyright
- © 2021 by Kai Sheng Tai
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...