Some results in high dimensional statistics : iterative algorithms and regression with missing data

Verchand, Kabir Aladin

Some results in high dimensional statistics : iterative algorithms and regression with missing data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fqg055vr1907" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: We study two problems at the interface between statistics and computation. In the first part of this thesis, we consider the problem of fitting a model to data by applying an iterative algorithm to perform empirical risk minimization. While the choice of iterative algorithm determines both the speed with which a fitted model is produced as well as the statistical quality of the produced model, the process of algorithm selection is often performed in a heuristic manner: via either expensive trial-and-error or appeal to (potentially) conservative worst case efficiency estimates. Motivated by this, we develop a framework---based on Gaussian comparison inequalities---to rigorously compare and contrast different iterative procedures to perform empirical risk minimization in an average case setting. In turn, we use this framework to demonstrate concrete separations in the convergence behavior of several iterative algorithms as well as to reveal some nonstandard convergence phenomena. In the second part of this thesis, we turn to studying parametric models with missing data in high dimensions. While this problem is well understood in the classical fixed dimension and infinite sample scaling, the situation changes drastically when the data is high dimensional. First, the high dimensional nature of the problem---which ensures that most samples contain missing entries---removes a complete case analysis from the statistician's toolbox. Second, the log-likelihood may be nonconcave in which case maximum likelihood estimation may prove computationally infeasible. Motivated by these issues, we study imputation-based methodology. In particular, we show that conditional mean imputation followed by a complete data method yields minimax optimal estimators in the linear model. We then show that the situation is quite different in the logistic model: in a stylized setting, conditional mean imputation can yield inconsistent estimators which nonetheless match the (Bayes) optimal prediction error.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Verchand, Kabir Aladin
Degree supervisor	Montanari, Andrea
Thesis advisor	Montanari, Andrea
Thesis advisor	Duchi, John
Thesis advisor	Wootters, Mary
Degree committee member	Duchi, John
Degree committee member	Wootters, Mary
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Kabir Aladin Verchand.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/qg055vr1907

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...