Statistical learning under resource constraints

Placeholder Show Content

Abstract/Contents

Abstract
Statistical learning algorithms are an increasingly prevalent component of modern software systems. As such, the design of learning algorithms themselves must take into account the constraints imposed by resource-constrained applications. This dissertation explores resource-constrained learning from two distinct perspectives: learning with limited memory and learning with limited labeled data. In Part I, we consider the challenge of learning with limited memory, a constraint that frequently arises in the context of learning on mobile or embedded devices. First, we describe a randomized sketching algorithm that learns a linear classifier in a compressed, space-efficient form---i.e., by storing far fewer parameters than the dimension of the input features. Unlike typical feature hashing approaches, our method allows for the efficient recovery of the largest magnitude weights in the learned classifier, thus facilitating model interpretation and enabling several memory-efficient stream processing applications. Next, we shift our focus to unsupervised learning, where we study low-rank matrix and tensor factorization on compressed data. In this setting, we establish conditions under which a factorization computed on compressed data can be used to provably recover factors in the original, high-dimensional space. In Part II, we study the statistical constraint of learning with limited labeled data. We first present Equivariant Transformer layers, a family of differentiable image-to-image mappings that improve sample efficiency by directly incorporating prior knowledge on transformation invariances into their architecture. We then discuss a self-training algorithm for semi-supervised learning, where a small number of labeled examples is supplemented by a large collection of unlabeled data. Our method reinterprets the semi-supervised label assignment process as an optimal transportation problem between examples and classes, the solution to which can be efficiently approximated via Sinkhorn iteration. This formulation subsumes several commonly used label assignment heuristics within a single principled optimization framework.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Tai, Kai Sheng
Degree supervisor Valiant, Gregory
Thesis advisor Valiant, Gregory
Thesis advisor Zaharia, Matei
Degree committee member Zaharia, Matei
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Kai Sheng Tai.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/mp952br3417

Access conditions

Copyright
© 2021 by Kai Sheng Tai
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...