Convex optimization for neural networks
Abstract/Contents
- Abstract
- Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this thesis, we examine the use of convex optimization theory to improve the training of neural networks and provide a better interpretation of their optimal weights. In this thesis, we focus on two-layer neural networks with piecewise linear activations and show that they can be formulated as finite-dimensional convex programs with a regularization term that promotes sparsity, which is a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite-width neural networks and then describe these architectures equivalently as high dimensional convex models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters. In the remaining parts of the thesis, we first extend the analysis to certain standard two and three-layer Convolutional Neural Networks (CNNs) can be globally optimized in fully polynomial time. Unlike the fully connected networks studied in the first part, we prove that these equivalent characterizations of CNNs have fully polynomial complexity in all input dimensions without resorting to any approximation techniques and therefore enjoy significant computational complexity improvements. We then discuss extensions of our convex analysis to various neural network architectures including vector output networks, batch normalization, Generative Adversarial Networks (GANs), deeper architectures, and threshold networks.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Ergen, Tolga |
---|---|
Degree supervisor | Pilanci, Mert |
Thesis advisor | Pilanci, Mert |
Thesis advisor | Boyd, Stephen P |
Thesis advisor | Weissman, Tsachy |
Degree committee member | Boyd, Stephen P |
Degree committee member | Weissman, Tsachy |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Tolga Ergen. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/sv935mh9248 |
Access conditions
- Copyright
- © 2023 by Tolga Ergen
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...