Fast stochastic algorithms for machine learning

De Sa, Christopher; Stanford University, Department of Electrical Engineering.

Fast stochastic algorithms for machine learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fzw552mk7074" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Machine learning problems typically involve large amounts of data. In order to process these large datasets efficiently on modern hardware, practitioners typically use one of a handful of popular ML algorithms such as stochastic gradient descent (SGD), which is used for almost all learning applications including deep learning, and Gibbs sampling, which is the de facto Markov chain Monte Carlo method used for statistical inference. In addition to being popular right now, these algorithms are classic and have been used in machine learning and other domains for many decades. Because of this, it is important to understand their limits and to develop techniques to make them run fast on modern hardware. SGD and Gibbs sampling are similar in that they are both stochastic: they leverage random sampling to compute using only a small subsample of the large input dataset at a time. This property means that these algorithms are inherently noisy, and do not produce a deterministic output---instead they produce noisy output that is not necessarily accurate. This means that practitioners can not always be confident in the results of their analysis. To address this, we need to develop conditions where, because of the structure of the problem, these algorithms can be guaranteed to output accurate solutions. In this dissertation, we will develop two such conditions, one for Gibbs sampling and one for SGD, which prove that these algorithms can be effective for problems for which guarantees were not previously known. Another consequence of the stochastic nature of SGD and Gibbs sampling is that it helps us design techniques for making these algorithms fast. This is because the fact that these algorithms are statistical means that they are naturally error-tolerant, and so we can often freely use techniques that improve throughput at the cost of introducing some amount of error. One popular technique of this type is HOGWILD, which speeds up a stochastic algorithm by parallelizing it on multiple threads without any synchronization or locking. The absence of locking means that the algorithm will experience errors in the form of race conditions, but it can be proven in some cases that these race conditions do not negatively affect the convergence of the algorithm. In this dissertation, we will prove that HOGWILD can be used to speed up SGD and Gibbs sampling in cases for which no previous guarantees were known. Another technique of this type is low-precision computation, which replaces the 32-bit or 64-bit floating point numbers that are traditionally used in machine learning applications with 8-bit or 16-bit fixed-point numbers. Low-precision computation makes algorithms faster because lower-precision numbers are easier to compute, transfer, and store. Just like HOGWILD, this technique also introduces noise in the form of quantization errors, but it can be shown that under some conditions this noise too does not affect convergence. In this dissertation, we will prove that low-precision arithmetic can be used in several settings in which it had not previously been proven to work, including for algorithms which combine low-precision arithmetic with HOGWILD, a technique which we call BUCKWILD. Given that we are going to use BUCKWILD to make a stochastic algorithm fast, it is important to understand how to implement it most efficiently and how to reason about the tradeoffs that exist when designing a BUCKWILD system. In this dissertation, we will address this issue by proposing a new way to conceptualize the way precision is used in a stochastic gradient descent implementation. We will also describe several software optimizations and hardware changes that can significantly improve the throughput of BUCKWILD SGD. We hope that these techniques, when added to our theoretical guarantees, will enable future practitioners to develop very fast stochastic algorithms that are still guaranteed to produce accurate results.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	De Sa, Christopher
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Boyd, Stephen
Thesis advisor	Ré, Christopher
Advisor	Boyd, Stephen
Advisor	Ré, Christopher

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Christopher De Sa.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...