Mean field limit in neural network learning : autoencoders and multilayer networks

Nguyẽ̂n, Phan Minh

Mean field limit in neural network learning : autoencoders and multilayer networks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ftz731wp0288" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: A major outstanding theoretical challenge in deep learning is the understanding of the learning dynamics of neural networks. The difficulty arises from the highly nonlinear and large-scaled structure of the network architecture, usually involving a large number of neurons at each layer, and the non-convex nature of the optimization problem, typically solved by convexity-inspired gradient-based learning rules without any strong guarantees. This begs two questions: Given such complex nature, is it possible to obtain a succinct description of the network's behavior over the course of training? If so, could it be used to shed light on properties of the learning process of neural networks? We explore these questions in a scaling limit regime that gives rise to one such description: the mean field limit. In this regime, the number of neurons is taken to infinity, and yet the network's behavior under gradient descent training converges to a nontrivial and nonlinear dynamical limit. The literature of the mean field limit for neural networks is fairly recent and has focused on two-layer feedforward networks. In this thesis, we analyze the mean field limit for two other important classes of models: weight-tied two-layer autoencoders and multilayer networks. The class of autoencoders constitutes a unique example of two-layer neural networks for unsupervised learning. It is among the rare instances known till date that we can derive an explicit solution to the mean field limit. This allows us to gain in-depth understanding of what the model learns about the high-dimensional data. The derived theory offers a striking match with empirical simulations on real life data. This example also gives rise to a challenging mathematical problem that deviates from previous analyses and inspires a new proof technique, as well as an open conjecture. The class of multilayer neural networks is the main thrust behind the recent breakthrough of deep learning. Being fundamentally different from the two-layer counterpart, it requires completely new ideas and insights. We show the existence of the mean field limit for this class of models via two approaches. In the first approach, we develop a formalism with a new idea on the operational meaning of the neurons, which is a priori unobservable but allows to reason for the existence of a mean field limit. In the second approach, we develop a mathematically rigorous framework which is used to prove properties of multilayer networks under training, with a new idea on a continuum that interpolates from finiteness to infinitude. In both of these approaches, we see a complete departure from the convex paradigm and welcome new insights that are uniquely of neural networks

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Nguyẽ̂n, Phan Minh
Degree supervisor	Montanari, Andrea
Thesis advisor	Montanari, Andrea
Thesis advisor	Özgür, Ayfer
Thesis advisor	Tse, David
Degree committee member	Özgür, Ayfer
Degree committee member	Tse, David
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Phan Minh Nguyen
Note	Submitted to the Department of Electrical Engineering
Thesis	Thesis Ph.D. Stanford University 2020
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...