Mean field limit in neural network learning : autoencoders and multilayer networks
- A major outstanding theoretical challenge in deep learning is the understanding of the learning dynamics of neural networks. The difficulty arises from the highly nonlinear and large-scaled structure of the network architecture, usually involving a large number of neurons at each layer, and the non-convex nature of the optimization problem, typically solved by convexity-inspired gradient-based learning rules without any strong guarantees. This begs two questions: Given such complex nature, is it possible to obtain a succinct description of the network's behavior over the course of training? If so, could it be used to shed light on properties of the learning process of neural networks? We explore these questions in a scaling limit regime that gives rise to one such description: the mean field limit. In this regime, the number of neurons is taken to infinity, and yet the network's behavior under gradient descent training converges to a nontrivial and nonlinear dynamical limit. The literature of the mean field limit for neural networks is fairly recent and has focused on two-layer feedforward networks. In this thesis, we analyze the mean field limit for two other important classes of models: weight-tied two-layer autoencoders and multilayer networks. The class of autoencoders constitutes a unique example of two-layer neural networks for unsupervised learning. It is among the rare instances known till date that we can derive an explicit solution to the mean field limit. This allows us to gain in-depth understanding of what the model learns about the high-dimensional data. The derived theory offers a striking match with empirical simulations on real life data. This example also gives rise to a challenging mathematical problem that deviates from previous analyses and inspires a new proof technique, as well as an open conjecture. The class of multilayer neural networks is the main thrust behind the recent breakthrough of deep learning. Being fundamentally different from the two-layer counterpart, it requires completely new ideas and insights. We show the existence of the mean field limit for this class of models via two approaches. In the first approach, we develop a formalism with a new idea on the operational meaning of the neurons, which is a priori unobservable but allows to reason for the existence of a mean field limit. In the second approach, we develop a mathematically rigorous framework which is used to prove properties of multilayer networks under training, with a new idea on a continuum that interpolates from finiteness to infinitude. In both of these approaches, we see a complete departure from the convex paradigm and welcome new insights that are uniquely of neural networks
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource
|Nguyẽ̂n, Phan Minh
|Degree committee member
|Degree committee member
|Stanford University, Department of Electrical Engineering.
|Statement of responsibility
|Phan Minh Nguyen
|Submitted to the Department of Electrical Engineering
|Thesis Ph.D. Stanford University 2020
- © 2020 by Phan Minh Nguyen
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...