Cross-validation and regression analysis in high-dimensional sparse linear models
Abstract/Contents
- Abstract
- Modern scientific research often involves experiments with at most hundreds of subjects but with tens of thousands of variables for every subject. The challenge of high dimensionality has reshaped statistical thinking and modeling. Variable selection plays a pivotal role in the high-dimensional data analysis, and the combination of sparsity and accuracy is crucial for statistical theory and practical applications. Regularization methods are attractive for tackling these sparsity and accuracy issues. The first part of this thesis studies two regularization methods. First, we consider the orthogonal greedy algorithm (OGA) used in conjunction with a high-dimensional information criterion introduced by Ing& Lai (2011). Although it has been shown to have excellent performance for weakly sparse regression models, one does not know a priori in practice that the actual model is weakly sparse, and we address this problem by developing a new cross-validation approach. OGA can be viewed as L0 regularization for weakly sparse regression models. When such sparsity fails, as revealed by the cross-validation analysis, we propose to use a new way to combine L1 and L2 penalties, which we show to have important advantages over previous regularization methods. The second part of the thesis develops a Monte Carlo Cross-Validation (MCCV) method to estimate the distribution of out-of-sample prediction errors when a training sample is used to build a regression model for prediction. Asymptotic theory and simulation studies show that the proposed MCCV method mimics the actual (but unknown) prediction error distribution even when the number of regressors exceeds the sample size. Therefore MCCV provides a useful tool for comparing the predictive performance of different regularization methods for real (rather than simulated) data sets.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2011 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Zhang, Feng | |
---|---|---|
Associated with | Stanford University, Department of Statistics | |
Primary advisor | Lai, T. L | |
Thesis advisor | Lai, T. L | |
Thesis advisor | Rajaratnam, Balakanapathy | |
Thesis advisor | Zhang, Nancy R. (Nancy Ruonan) | |
Advisor | Rajaratnam, Balakanapathy | |
Advisor | Zhang, Nancy R. (Nancy Ruonan) |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Feng Zhang. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2011. |
Location | electronic resource |
Access conditions
- Copyright
- © 2011 by Feng Zhang
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...