Lightweight statistical learning : accelerating and avoiding empirical risk minimization

Frostig, Roy; Stanford University, Computer Science Department.

Lightweight statistical learning : accelerating and avoiding empirical risk minimization

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpc711bb6831" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In statistical machine learning, the goal is to train a model that, once deployed in the world, continues to predict accurately on fresh data. A unifying training principle is empirical risk minimization (ERM): globally minimizing the average prediction error incurred on a representative training set. Essentially, ERM prescribes optimization as a proxy for statistical estimation. Learning tasks for which ERM is computationally tractable call for optimization algorithms that scale as efficiently as possible as training set dimensions grow. Other tasks---namely, neural network learning and learning with indirect supervision---elude a general, tractable ERM algorithm in theory, motivating a reformulation of the training problem altogether. In this thesis, we first focus on efficient algorithms for empirical risk minimization in the overdetermined, convex setting with fully-observed data, where ERM is tractable and the aim is optimal computational efficiency. We improve the guaranteed running time for a broad range of problems (including basic regression problems) in terms of the condition number induced by the underlying training set. Atop these methods, we develop an algorithm for principal component regression. Next, we move to settings where general tractability of ERM is not guaranteed, but still guides algorithm design in practice. Specifically, we study two learning frameworks: convolutional neural network learning, and learning conditional random fields from indirect supervision. In either context, the prevalent replacement to global ERM is gradient descent. Since descent could converge to arbitrarily bad local optima, it renders initialization more relevant. In each setting, we develop lightweight training algorithms based on convex procedures. For neural networks, we consider replacing optimization of hidden layers with randomization. For indirect supervision, we reduce estimation to solving a linear system followed by a convex maximum-likelihood step. Via the role of initialization, both algorithms imply conditions under which local descent ought to fare at least as well as they do.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Frostig, Roy
Associated with	Stanford University, Computer Science Department.
Primary advisor	Liang, Percy
Thesis advisor	Liang, Percy
Thesis advisor	Duchi, John
Thesis advisor	Valiant, Gregory
Advisor	Duchi, John
Advisor	Valiant, Gregory

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Roy Frostig.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...