An Unexpected Challenge in Using Forward-Mode Automatic Differentiation for Low-Memory Deep Learning
Abstract/Contents
- Abstract
In this thesis, I study a simple randomized algorithm for training neural networks with extremely low memory overhead: "guess the gradient"' (GTG). I describe how to efficiently compute the directional derivative of the network's loss with respect to a randomly hypothesized gradient, and use this information to refine the hypothesis into a noisy unbiased gradient estimator that can be passed to a standard gradient descent optimizer.
Previous theoretical work has concluded that in convex settings, GTG-like algorithms suffer an O(N) slowdown for N-dimensional problems, making them impractical for large-scale deep learning. However, because the directional derivative can be computed without backpropagation, GTG can be run using very little memory. This valuable property, along with the possibility of a simple (novel, to our knowledge) variance reduction technique, encourages us to nonetheless try applying GTG in memory-bound deep learning settings.
We find that in practice GTG does not perform well on a standard deep learning optimization task — but, curiously, not for the "obvious" reason of O(N)-slower convergence. In early phases of training GTG indeed does as well as SGD with a comparable step size; however, in later phases we observe a sudden "plateauing" phenomenon that is as yet unexplained. Understanding this phenomenon could suggest a way to make GTG practical, or, failing that, shed light on the surprising effectiveness of SGD.
Description
Type of resource | text |
---|---|
Date created | May 4, 2021 |
Date modified | February 18, 2022; December 5, 2022 |
Publication date | May 21, 2021 |
Creators/Contributors
Author | Chandra, Kartik | |
---|---|---|
Degree granting institution | Stanford University, Department of Computer Science | |
Thesis advisor | Valiant, Gregory | |
Thesis advisor | Tan, Li-Yang |
Subjects
Subject | optimization |
---|---|
Subject | automatic differentiation |
Subject | deep learning |
Subject | machine learning |
Subject | systems |
Subject | Stanford University |
Subject | Computer Science Department |
Subject | School of Engineering |
Subject | Computer Science |
Subject | Honors Thesis |
Genre | Text |
Genre | Thesis |
Bibliographic information
Related item |
|
---|---|
Location | https://purl.stanford.edu/wk389rs3026 |
Access conditions
- Use and reproduction
- User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 4.0 International license (CC BY-NC).
Preferred citation
- Preferred citation
- Chandra, Kartik. (2021). An Unexpected Challenge in Using Forward-Mode Automatic Differentiation for Low-Memory Deep Learning. Stanford Digital Repository. Available at: https://purl.stanford.edu/wk389rs3026
Collection
Undergraduate Theses, School of Engineering
View other items in this collection in SearchWorksContact information
- Contact
- engreference@stanford.edu
Also listed in
Loading usage metrics...