Mean field limit in neural network learning : autoencoders and multilayer networks

Placeholder Show Content

Abstract/Contents

Abstract
A major outstanding theoretical challenge in deep learning is the understanding of the learning dynamics of neural networks. The difficulty arises from the highly nonlinear and large-scaled structure of the network architecture, usually involving a large number of neurons at each layer, and the non-convex nature of the optimization problem, typically solved by convexity-inspired gradient-based learning rules without any strong guarantees. This begs two questions: Given such complex nature, is it possible to obtain a succinct description of the network's behavior over the course of training? If so, could it be used to shed light on properties of the learning process of neural networks? We explore these questions in a scaling limit regime that gives rise to one such description: the mean field limit. In this regime, the number of neurons is taken to infinity, and yet the network's behavior under gradient descent training converges to a nontrivial and nonlinear dynamical limit. The literature of the mean field limit for neural networks is fairly recent and has focused on two-layer feedforward networks. In this thesis, we analyze the mean field limit for two other important classes of models: weight-tied two-layer autoencoders and multilayer networks. The class of autoencoders constitutes a unique example of two-layer neural networks for unsupervised learning. It is among the rare instances known till date that we can derive an explicit solution to the mean field limit. This allows us to gain in-depth understanding of what the model learns about the high-dimensional data. The derived theory offers a striking match with empirical simulations on real life data. This example also gives rise to a challenging mathematical problem that deviates from previous analyses and inspires a new proof technique, as well as an open conjecture. The class of multilayer neural networks is the main thrust behind the recent breakthrough of deep learning. Being fundamentally different from the two-layer counterpart, it requires completely new ideas and insights. We show the existence of the mean field limit for this class of models via two approaches. In the first approach, we develop a formalism with a new idea on the operational meaning of the neurons, which is a priori unobservable but allows to reason for the existence of a mean field limit. In the second approach, we develop a mathematically rigorous framework which is used to prove properties of multilayer networks under training, with a new idea on a continuum that interpolates from finiteness to infinitude. In both of these approaches, we see a complete departure from the convex paradigm and welcome new insights that are uniquely of neural networks

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Nguyẽ̂n, Phan Minh
Degree supervisor Montanari, Andrea
Thesis advisor Montanari, Andrea
Thesis advisor Özgür, Ayfer
Thesis advisor Tse, David
Degree committee member Özgür, Ayfer
Degree committee member Tse, David
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Phan Minh Nguyen
Note Submitted to the Department of Electrical Engineering
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

Copyright
© 2020 by Phan Minh Nguyen
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...