Algorithms and lower bounds for parameter-free online learning
Abstract/Contents
- Abstract
- Training a machine learning model today involves minimizing a loss function on datasets that are often gigantic, and so almost all practically relevant training algorithms operate in an online manner by reading in small chunks of the data at a time and making updates to the model on-the-fly. As a result, online learning, a popular way to analyze optimization algorithms operating on datastreams, is at the heart of modern machine learning pipelines. In order to converge to the optimal model as quickly as possible, online learning algorithms all require some user-specified parameters that reflect the shape of the loss or statistics of the input data. Examples of such parameters include the size of the gradients of the losses, the distance from some initial model to the optimal model, and the amount of variance in the data, among others. Since the true values for these parameters are often unknown, the practical implementation of online learning algorithms usually involves simply guessing (called ``tuning''), which is both inefficient and inelegant. This motivates the search for parameter-free algorithms that can adapt to these unknown values. Prior algorithms have achieved adaptivity to many different unknown parameters individually - for example one may adapt to an unknown gradient sizes given a known distance to the optimal model, or adapt to the unknown distance given a known bound on gradient size. However, no algorithm could adapt to both parameters simultaneously. This work introduces new lower bounds, algorithms, and analysis techniques for adapting to many parameters at once. We begin by proving a lower bound showing that adapting to both the size of the gradients and distance to optimal model simultaneously is fundamentally much harder than adapting to either individually, and proceed to develop the first algorithm to meet this lower bound, obtaining optimal adaptivity to both parameters at once. We then expand upon this result to design algorithms that adapt to more unknown parameters, including the variance of the data, different methods for measuring distances, and upper or lower bounds on the second derivative of the loss. We obtain these results by developing new techniques that convert non-parameter-free optimization algorithms into parameter-free algorithms. In addition to providing new and more adaptive algorithms, the relative simplicity of non-parameter-free algorithms allows these techniques to significantly reduce the complexity of many prior analyses.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2018; ©2018 |
Publication date | 2018; 2018 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Cutkosky, Ashok |
---|---|
Degree supervisor | Boahen, Kwabena (Kwabena Adu) |
Degree supervisor | Liang, Percy |
Thesis advisor | Boahen, Kwabena (Kwabena Adu) |
Thesis advisor | Liang, Percy |
Thesis advisor | Boneh, Dan, 1969- |
Thesis advisor | Duchi, John |
Thesis advisor | Sidford, Aaron |
Degree committee member | Boneh, Dan, 1969- |
Degree committee member | Duchi, John |
Degree committee member | Sidford, Aaron |
Associated with | Stanford University, Computer Science Department. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Ashok Cutkosky. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis Ph.D. Stanford University 2018. |
Location | electronic resource |
Access conditions
- Copyright
- © 2018 by Ashok Krishnan Cutkosky
Also listed in
Loading usage metrics...