Some results in high dimensional statistics : iterative algorithms and regression with missing data
- We study two problems at the interface between statistics and computation. In the first part of this thesis, we consider the problem of fitting a model to data by applying an iterative algorithm to perform empirical risk minimization. While the choice of iterative algorithm determines both the speed with which a fitted model is produced as well as the statistical quality of the produced model, the process of algorithm selection is often performed in a heuristic manner: via either expensive trial-and-error or appeal to (potentially) conservative worst case efficiency estimates. Motivated by this, we develop a framework---based on Gaussian comparison inequalities---to rigorously compare and contrast different iterative procedures to perform empirical risk minimization in an average case setting. In turn, we use this framework to demonstrate concrete separations in the convergence behavior of several iterative algorithms as well as to reveal some nonstandard convergence phenomena. In the second part of this thesis, we turn to studying parametric models with missing data in high dimensions. While this problem is well understood in the classical fixed dimension and infinite sample scaling, the situation changes drastically when the data is high dimensional. First, the high dimensional nature of the problem---which ensures that most samples contain missing entries---removes a complete case analysis from the statistician's toolbox. Second, the log-likelihood may be nonconcave in which case maximum likelihood estimation may prove computationally infeasible. Motivated by these issues, we study imputation-based methodology. In particular, we show that conditional mean imputation followed by a complete data method yields minimax optimal estimators in the linear model. We then show that the situation is quite different in the logistic model: in a stylized setting, conditional mean imputation can yield inconsistent estimators which nonetheless match the (Bayes) optimal prediction error.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Verchand, Kabir Aladin
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Kabir Aladin Verchand.
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Kabir Aladin Verchand
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...