Computational and statistical theories for large-scale neural networks

Placeholder Show Content

Abstract/Contents

Abstract
Deep learning methods operate in regimes that defy the traditional computational and statistical mindsets. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find an approximate global minimizer of the training loss and achieve small generalization error on test data. In recent years, an important research direction is to theoretically explain these observed optimization efficiency and generalization efficacy of neural network systems. This thesis tries to tackle these challenges in the model of two-layers neural networks, by analyzing its computational and statistical properties in various scaling limits. On the computational aspects, we introduce two competing theories for neural network dynamics: the mean field theory and the tangent kernel theory. These two theories characterize training dynamics of neural networks in different regimes that exhibit different behaviors. In the mean field framework, the training dynamics, in the large neuron limit, is captured by a particular non-linear partial differential equation. This characterization allows us to prove global convergence of the dynamics in certain scenarios. Comparatively, the tangent kernel theory characterizes the same dynamics in a different scaling limit and provides global convergence guarantees in more general scenarios. On the statistical aspects, we study the generalization properties of neural networks trained in the two regimes as described above. We first show that, in the high dimensional limit, neural tangent kernels are no better than polynomial regression, while neural networks trained in the mean field regime can potentially perform better. Next, we study more carefully the random features model, which is equivalent to a two-layers neural network in the kernel regime. We compute the precise asymptotics of its test error in the high dimensional limit and confirm that it exhibits an interesting double-descent curve that was observed in experiments

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Mei, Song, (Researcher in data science)
Degree supervisor Montanari, Andrea
Thesis advisor Montanari, Andrea
Thesis advisor Johnstone, Iain
Thesis advisor Ying, Lexing
Degree committee member Johnstone, Iain
Degree committee member Ying, Lexing
Associated with Stanford University, Institute for Computational and Mathematical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Song Mei
Note Submitted to the Institute for Computational and Mathematical Engineering
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

Copyright
© 2020 by Song Mei
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...