Statistical learning under resource constraints

Tai, Kai Sheng

Statistical learning under resource constraints

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmp952br3417" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Statistical learning algorithms are an increasingly prevalent component of modern software systems. As such, the design of learning algorithms themselves must take into account the constraints imposed by resource-constrained applications. This dissertation explores resource-constrained learning from two distinct perspectives: learning with limited memory and learning with limited labeled data. In Part I, we consider the challenge of learning with limited memory, a constraint that frequently arises in the context of learning on mobile or embedded devices. First, we describe a randomized sketching algorithm that learns a linear classifier in a compressed, space-efficient form---i.e., by storing far fewer parameters than the dimension of the input features. Unlike typical feature hashing approaches, our method allows for the efficient recovery of the largest magnitude weights in the learned classifier, thus facilitating model interpretation and enabling several memory-efficient stream processing applications. Next, we shift our focus to unsupervised learning, where we study low-rank matrix and tensor factorization on compressed data. In this setting, we establish conditions under which a factorization computed on compressed data can be used to provably recover factors in the original, high-dimensional space. In Part II, we study the statistical constraint of learning with limited labeled data. We first present Equivariant Transformer layers, a family of differentiable image-to-image mappings that improve sample efficiency by directly incorporating prior knowledge on transformation invariances into their architecture. We then discuss a self-training algorithm for semi-supervised learning, where a small number of labeled examples is supplemented by a large collection of unlabeled data. Our method reinterprets the semi-supervised label assignment process as an optimal transportation problem between examples and classes, the solution to which can be efficiently approximated via Sinkhorn iteration. This formulation subsumes several commonly used label assignment heuristics within a single principled optimization framework.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Tai, Kai Sheng
Degree supervisor	Valiant, Gregory
Thesis advisor	Valiant, Gregory
Thesis advisor	Zaharia, Matei
Degree committee member	Zaharia, Matei
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Kai Sheng Tai.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/mp952br3417

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...