Optimization and high-dimensional loss landscapes in deep learning

Larsen, Brett William

Optimization and high-dimensional loss landscapes in deep learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fyj314kt7539" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Despite deep learning's impressive success, many questions remain concerning how training such high-dimensional models behaves in practice and why it reliably produces useful networks. We employ an empirical approach, performing experiments guided by theoretical predictions, to study the following through the lens of the loss landscape. (1) How do loss landscape properties affect the success or failure of weight pruning methods? Recent work on two fronts -- the lottery tickets hypothesis and training restricted to random subspaces -- has demonstrated that deep neural networks can be successfully optimized using far fewer degrees of freedom than the total number of parameters. In particular, lottery tickets, or sparse subnetworks capable of matching the full model's accuracy, can be identified via iterative pruning and retraining of the weights. We first provide a framework for the success of low-dimensional training in terms of the high-dimensional geometry of the loss landscape. We then leverage this framework both to better understand the success of lottery tickets and to predict how aggressively we can prune the weights at each iteration. (2) What are the algorithmic advantages of recurrent connections in neural networks? One of the brain's most striking anatomical features is the ubiquity of lateral and recurrent connections. Yet while the strong computational abilities of feedforward networks have been extensively studied, our understanding of the role of recurrent computations that might explain their prevalence remains an important open challenge. We demonstrate that recurrent connections are efficient for performing tasks that can be solved via repeated, local propagation of information and propose that they can be combined with feedforward architectures for efficient computation across timescales.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Larsen, Brett William
Degree supervisor	Druckmann, Shaul
Degree supervisor	Ganguli, Surya, 1977-
Thesis advisor	Druckmann, Shaul
Thesis advisor	Ganguli, Surya, 1977-
Thesis advisor	Goldhaber-Gordon, David, 1972-
Degree committee member	Goldhaber-Gordon, David, 1972-
Associated with	Stanford University, Department of Physics

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Brett W. Larsen.
Note	Submitted to the Department of Physics.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/yj314kt7539

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...