Algorithms and lower bounds for parameter-free online learning

Cutkosky, Ashok

Algorithms and lower bounds for parameter-free online learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fsn553ch1489" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Training a machine learning model today involves minimizing a loss function on datasets that are often gigantic, and so almost all practically relevant training algorithms operate in an online manner by reading in small chunks of the data at a time and making updates to the model on-the-fly. As a result, online learning, a popular way to analyze optimization algorithms operating on datastreams, is at the heart of modern machine learning pipelines. In order to converge to the optimal model as quickly as possible, online learning algorithms all require some user-specified parameters that reflect the shape of the loss or statistics of the input data. Examples of such parameters include the size of the gradients of the losses, the distance from some initial model to the optimal model, and the amount of variance in the data, among others. Since the true values for these parameters are often unknown, the practical implementation of online learning algorithms usually involves simply guessing (called ``tuning''), which is both inefficient and inelegant. This motivates the search for parameter-free algorithms that can adapt to these unknown values. Prior algorithms have achieved adaptivity to many different unknown parameters individually - for example one may adapt to an unknown gradient sizes given a known distance to the optimal model, or adapt to the unknown distance given a known bound on gradient size. However, no algorithm could adapt to both parameters simultaneously. This work introduces new lower bounds, algorithms, and analysis techniques for adapting to many parameters at once. We begin by proving a lower bound showing that adapting to both the size of the gradients and distance to optimal model simultaneously is fundamentally much harder than adapting to either individually, and proceed to develop the first algorithm to meet this lower bound, obtaining optimal adaptivity to both parameters at once. We then expand upon this result to design algorithms that adapt to more unknown parameters, including the variance of the data, different methods for measuring distances, and upper or lower bounds on the second derivative of the loss. We obtain these results by developing new techniques that convert non-parameter-free optimization algorithms into parameter-free algorithms. In addition to providing new and more adaptive algorithms, the relative simplicity of non-parameter-free algorithms allows these techniques to significantly reduce the complexity of many prior analyses.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2018; ©2018
Publication date	2018; 2018
Issuance	monographic
Language	English

Creators/Contributors

Author	Cutkosky, Ashok
Degree supervisor	Boahen, Kwabena (Kwabena Adu)
Degree supervisor	Liang, Percy
Thesis advisor	Boahen, Kwabena (Kwabena Adu)
Thesis advisor	Liang, Percy
Thesis advisor	Boneh, Dan, 1969-
Thesis advisor	Duchi, John
Thesis advisor	Sidford, Aaron
Degree committee member	Boneh, Dan, 1969-
Degree committee member	Duchi, John
Degree committee member	Sidford, Aaron
Associated with	Stanford University, Computer Science Department.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ashok Cutkosky.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis Ph.D. Stanford University 2018.
Location	electronic resource

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...