Computational and statistical theories for large-scale neural networks

Mei, Song, (Researcher in data science)

Computational and statistical theories for large-scale neural networks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmm676zm0933" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits. On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios. On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Mei, Song, (Researcher in data science)
Degree supervisor	Montanari, Andrea
Thesis advisor	Montanari, Andrea
Thesis advisor	Johnstone, Iain
Thesis advisor	Ying, Lexing
Degree committee member	Johnstone, Iain
Degree committee member	Ying, Lexing
Associated with	Stanford University, Institute for Computational and Mathematical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Song Mei
Note	Submitted to the Institute for Computational and Mathematical Engineering
Thesis	Thesis Ph.D. Stanford University 2020
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...