Convex optimization for neural networks

Ergen, Tolga

Convex optimization for neural networks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fsv935mh9248" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this thesis, we examine the use of convex optimization theory to improve the training of neural networks and provide a better interpretation of their optimal weights. In this thesis, we focus on two-layer neural networks with piecewise linear activations and show that they can be formulated as finite-dimensional convex programs with a regularization term that promotes sparsity, which is a variant of group Lasso. We first utilize semi-infinite programming theory to prove strong duality for finite-width neural networks and then describe these architectures equivalently as high dimensional convex models. Remarkably, the worst-case complexity to solve the convex program is polynomial in the number of samples and number of neurons when the rank of the data matrix is bounded, which is the case in convolutional networks. To extend our method to training data of arbitrary rank, we develop a novel polynomial-time approximation scheme based on zonotope subsampling that comes with a guaranteed approximation ratio. Our convex models can be trained using standard convex solvers without resorting to heuristics or extensive hyper-parameter tuning unlike non-convex methods. Due to the convexity, optimizer hyperparameters such as initialization, batch sizes, and step size schedules have no effect on the final model. Through extensive numerical experiments, we show that convex models can outperform traditional non-convex methods and are not sensitive to optimizer hyperparameters. In the remaining parts of the thesis, we first extend the analysis to certain standard two and three-layer Convolutional Neural Networks (CNNs) can be globally optimized in fully polynomial time. Unlike the fully connected networks studied in the first part, we prove that these equivalent characterizations of CNNs have fully polynomial complexity in all input dimensions without resorting to any approximation techniques and therefore enjoy significant computational complexity improvements. We then discuss extensions of our convex analysis to various neural network architectures including vector output networks, batch normalization, Generative Adversarial Networks (GANs), deeper architectures, and threshold networks.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Ergen, Tolga
Degree supervisor	Pilanci, Mert
Thesis advisor	Pilanci, Mert
Thesis advisor	Boyd, Stephen P
Thesis advisor	Weissman, Tsachy
Degree committee member	Boyd, Stephen P
Degree committee member	Weissman, Tsachy
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Tolga Ergen.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/sv935mh9248

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...