Regularization in high-dimensional statistics
- Modern datasets are growing in terms of samples but even more so in terms of variables. We often encounter datasets where samples consists of time series, images, even movies, so that each sample has thousands, even millions of variables. Classical statistical approaches are inadequate for working with such high-dimensional data because they rely on theoretical and computational tools developed without such data in mind. The work in this thesis seeks to close the apparent gap between the growing size of emerging datasets and the capabilities of existing approaches to statistical estimation, inference, and computing. This thesis focuses on two problems that arise in learning from high-dimensional data (versus black-box approaches that do not yield insights into the underlying data-generation process). They are: 1. model selection and post-selection inference: discover the latent low-dimensional structure in high-dimensional data; 2. scalable statistical computing: design scalable estimators and algorithms that avoid communication and minimize ``passes'' over the data. The work relies crucially on results from convex analysis and geometry. Many of the algorithms and proofs are inspired by results from this beautiful but dusty corner of mathematics.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Institute for Computational and Mathematical Engineering.
|Statement of responsibility
|Submitted to the Institute for Computational and Mathematical Engineering.
|Thesis (Ph.D.)--Stanford University, 2015.
- © 2015 by Yuekai Sun
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...