Computational and statistical theories for large-scale neural networks
Abstract/Contents
- Abstract
- Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits. On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios. On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2020; ©2020 |
Publication date | 2020; 2020 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Mei, Song, (Researcher in data science) |
---|---|
Degree supervisor | Montanari, Andrea |
Thesis advisor | Montanari, Andrea |
Thesis advisor | Johnstone, Iain |
Thesis advisor | Ying, Lexing |
Degree committee member | Johnstone, Iain |
Degree committee member | Ying, Lexing |
Associated with | Stanford University, Institute for Computational and Mathematical Engineering. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Song Mei |
---|---|
Note | Submitted to the Institute for Computational and Mathematical Engineering |
Thesis | Thesis Ph.D. Stanford University 2020 |
Location | electronic resource |
Access conditions
- Copyright
- © 2020 by Song Mei
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...