Convex optimization for neural networks

Placeholder Show Content

Abstract/Contents

Abstract
Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this thesis, we examine the use of convex optimization theory to improve the training of neural networks and provide a better interpretation of their optimal weights. In this thesis, we focus on two-layer neural networks with piecewise linear activations and show that they can be formulated as finite-dimensional convex programs with a regularization term that promotes sparsity, which is a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite-width neural networks and then describe these architectures equivalently as high dimensional convex models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters. In the remaining parts of the thesis, we first extend the analysis to certain standard two and three-layer Convolutional Neural Networks (CNNs) can be globally optimized in fully polynomial time. Unlike the fully connected networks studied in the first part, we prove that these equivalent characterizations of CNNs have fully polynomial complexity in all input dimensions without resorting to any approximation techniques and therefore enjoy significant computational complexity improvements. We then discuss extensions of our convex analysis to various neural network architectures including vector output networks, batch normalization, Generative Adversarial Networks (GANs), deeper architectures, and threshold networks.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Ergen, Tolga
Degree supervisor Pilanci, Mert
Thesis advisor Pilanci, Mert
Thesis advisor Boyd, Stephen P
Thesis advisor Weissman, Tsachy
Degree committee member Boyd, Stephen P
Degree committee member Weissman, Tsachy
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Tolga Ergen.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/sv935mh9248

Access conditions

Copyright
© 2023 by Tolga Ergen
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...