Algorithms and lower bounds for parameter-free online learning

Placeholder Show Content

Abstract/Contents

Abstract
Training a machine learning model today involves minimizing a loss function on datasets that are often gigantic, and so almost all practically relevant training algorithms operate in an online manner by reading in small chunks of the data at a time and making updates to the model on-the-fly. As a result, online learning, a popular way to analyze optimization algorithms operating on datastreams, is at the heart of modern machine learning pipelines. In order to converge to the optimal model as quickly as possible, online learning algorithms all require some user-specified parameters that reflect the shape of the loss or statistics of the input data. Examples of such parameters include the size of the gradients of the losses, the distance from some initial model to the optimal model, and the amount of variance in the data, among others. Since the true values for these parameters are often unknown, the practical implementation of online learning algorithms usually involves simply guessing (called ``tuning''), which is both inefficient and inelegant. This motivates the search for parameter-free algorithms that can adapt to these unknown values. Prior algorithms have achieved adaptivity to many different unknown parameters individually - for example one may adapt to an unknown gradient sizes given a known distance to the optimal model, or adapt to the unknown distance given a known bound on gradient size. However, no algorithm could adapt to both parameters simultaneously. This work introduces new lower bounds, algorithms, and analysis techniques for adapting to many parameters at once. We begin by proving a lower bound showing that adapting to both the size of the gradients and distance to optimal model simultaneously is fundamentally much harder than adapting to either individually, and proceed to develop the first algorithm to meet this lower bound, obtaining optimal adaptivity to both parameters at once. We then expand upon this result to design algorithms that adapt to more unknown parameters, including the variance of the data, different methods for measuring distances, and upper or lower bounds on the second derivative of the loss. We obtain these results by developing new techniques that convert non-parameter-free optimization algorithms into parameter-free algorithms. In addition to providing new and more adaptive algorithms, the relative simplicity of non-parameter-free algorithms allows these techniques to significantly reduce the complexity of many prior analyses.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Cutkosky, Ashok
Degree supervisor Boahen, Kwabena (Kwabena Adu)
Degree supervisor Liang, Percy
Thesis advisor Boahen, Kwabena (Kwabena Adu)
Thesis advisor Liang, Percy
Thesis advisor Boneh, Dan, 1969-
Thesis advisor Duchi, John
Thesis advisor Sidford, Aaron
Degree committee member Boneh, Dan, 1969-
Degree committee member Duchi, John
Degree committee member Sidford, Aaron
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Ashok Cutkosky.
Note Submitted to the Department of Computer Science.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Ashok Krishnan Cutkosky

Also listed in

Loading usage metrics...