Some results in high dimensional statistics : iterative algorithms and regression with missing data

Placeholder Show Content

Abstract/Contents

Abstract
We study two problems at the interface between statistics and computation. In the first part of this thesis, we consider the problem of fitting a model to data by applying an iterative algorithm to perform empirical risk minimization. While the choice of iterative algorithm determines both the speed with which a fitted model is produced as well as the statistical quality of the produced model, the process of algorithm selection is often performed in a heuristic manner: via either expensive trial-and-error or appeal to (potentially) conservative worst case efficiency estimates. Motivated by this, we develop a framework---based on Gaussian comparison inequalities---to rigorously compare and contrast different iterative procedures to perform empirical risk minimization in an average case setting. In turn, we use this framework to demonstrate concrete separations in the convergence behavior of several iterative algorithms as well as to reveal some nonstandard convergence phenomena. In the second part of this thesis, we turn to studying parametric models with missing data in high dimensions. While this problem is well understood in the classical fixed dimension and infinite sample scaling, the situation changes drastically when the data is high dimensional. First, the high dimensional nature of the problem---which ensures that most samples contain missing entries---removes a complete case analysis from the statistician's toolbox. Second, the log-likelihood may be nonconcave in which case maximum likelihood estimation may prove computationally infeasible. Motivated by these issues, we study imputation-based methodology. In particular, we show that conditional mean imputation followed by a complete data method yields minimax optimal estimators in the linear model. We then show that the situation is quite different in the logistic model: in a stylized setting, conditional mean imputation can yield inconsistent estimators which nonetheless match the (Bayes) optimal prediction error.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Verchand, Kabir Aladin
Degree supervisor Montanari, Andrea
Thesis advisor Montanari, Andrea
Thesis advisor Duchi, John
Thesis advisor Wootters, Mary
Degree committee member Duchi, John
Degree committee member Wootters, Mary
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Kabir Aladin Verchand.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/qg055vr1907

Access conditions

Copyright
© 2023 by Kabir Aladin Verchand
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...