Fast stochastic algorithms for machine learning

Placeholder Show Content

Abstract/Contents

Abstract
Machine learning problems typically involve large amounts of data. In order to process these large datasets efficiently on modern hardware, practitioners typically use one of a handful of popular ML algorithms such as stochastic gradient descent (SGD), which is used for almost all learning applications including deep learning, and Gibbs sampling, which is the de facto Markov chain Monte Carlo method used for statistical inference. In addition to being popular right now, these algorithms are classic and have been used in machine learning and other domains for many decades. Because of this, it is important to understand their limits and to develop techniques to make them run fast on modern hardware. SGD and Gibbs sampling are similar in that they are both stochastic: they leverage random sampling to compute using only a small subsample of the large input dataset at a time. This property means that these algorithms are inherently noisy, and do not produce a deterministic output---instead they produce noisy output that is not necessarily accurate. This means that practitioners can not always be confident in the results of their analysis. To address this, we need to develop conditions where, because of the structure of the problem, these algorithms can be guaranteed to output accurate solutions. In this dissertation, we will develop two such conditions, one for Gibbs sampling and one for SGD, which prove that these algorithms can be effective for problems for which guarantees were not previously known. Another consequence of the stochastic nature of SGD and Gibbs sampling is that it helps us design techniques for making these algorithms fast. This is because the fact that these algorithms are statistical means that they are naturally error-tolerant, and so we can often freely use techniques that improve throughput at the cost of introducing some amount of error. One popular technique of this type is HOGWILD, which speeds up a stochastic algorithm by parallelizing it on multiple threads without any synchronization or locking. The absence of locking means that the algorithm will experience errors in the form of race conditions, but it can be proven in some cases that these race conditions do not negatively affect the convergence of the algorithm. In this dissertation, we will prove that HOGWILD can be used to speed up SGD and Gibbs sampling in cases for which no previous guarantees were known. Another technique of this type is low-precision computation, which replaces the 32-bit or 64-bit floating point numbers that are traditionally used in machine learning applications with 8-bit or 16-bit fixed-point numbers. Low-precision computation makes algorithms faster because lower-precision numbers are easier to compute, transfer, and store. Just like HOGWILD, this technique also introduces noise in the form of quantization errors, but it can be shown that under some conditions this noise too does not affect convergence. In this dissertation, we will prove that low-precision arithmetic can be used in several settings in which it had not previously been proven to work, including for algorithms which combine low-precision arithmetic with HOGWILD, a technique which we call BUCKWILD. Given that we are going to use BUCKWILD to make a stochastic algorithm fast, it is important to understand how to implement it most efficiently and how to reason about the tradeoffs that exist when designing a BUCKWILD system. In this dissertation, we will address this issue by proposing a new way to conceptualize the way precision is used in a stochastic gradient descent implementation. We will also describe several software optimizations and hardware changes that can significantly improve the throughput of BUCKWILD SGD. We hope that these techniques, when added to our theoretical guarantees, will enable future practitioners to develop very fast stochastic algorithms that are still guaranteed to produce accurate results.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with De Sa, Christopher
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Olukotun, Oyekunle Ayinde
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Boyd, Stephen
Thesis advisor Ré, Christopher
Advisor Boyd, Stephen
Advisor Ré, Christopher

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Christopher De Sa.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Christopher Matthew De Sa
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...