When do gradient methods work well in non-convex learning problems?

Placeholder Show Content

Abstract/Contents

Abstract
Classical learning theory advocates the use of convex losses in machine learning due to its guaranteed computational efficiency, yet recent success in deep learning suggests that highly non-convex losses can often be efficiently minimized by simple gradient-based algorithms, a phenomenon not well explained by existing theory. Towards closing this gap, this thesis proposes and instantiates landscape analyses, a theoretical framework for studying the optimization of non-convex learning problems via analyzing optimization-aware geometric properties of the empirical risk. In the first part of the thesis, we establish a generic connection between the optimization landscape of smooth non-convex empirical risks, namely, that their gradients and Hessians concentrate uniformly around those of the population risk with statistical rate. Consequently, benign landscape properties on the population risk can be carried onto the empirical risk as well when the sample size is sufficiently large. We apply this principle to showing that gradient descent succeeds in solving several non-convex empirical risk minimization problems, including non-convex binary classification (and its sparse version), robust linear regression, and Gaussian mixture estimation. In the second part, we adopt landscape analysis to solving the orthogonal dictionary learning problem via minimizing a non-convex non-smooth L1 risk on the sphere, which was shown to work well empirically but not well understood in theory. Standard uniform convergence analysis fails on this problem as the subgradients are set-valued and non-Lipschitz. We develop tools, building on Hausdorff distance between sets and a novel sign-aware metric customized for the dictionary learning problem, to handle the above issues and establish the uniform convergence of subgradients. We show that the landscape of the non-smooth empirical risk is benign and subgradient descent succeeds in recovering the dictionary with required sample complexity lower than existing approaches based on non-convex optimization

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Bai, Yu, (Statistician)
Degree supervisor Duchi, John
Thesis advisor Duchi, John
Thesis advisor Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor Montanari, Andrea
Degree committee member Candès, Emmanuel J. (Emmanuel Jean)
Degree committee member Montanari, Andrea
Associated with Stanford University, Department of Statistics.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Yu Bai
Note Submitted to the Department of Statistics
Thesis Thesis Ph.D. Stanford University 2019
Location electronic resource

Access conditions

Copyright
© 2019 by Yu Bai
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...