Selecting the dimension of a subspace in principal component analysis and canonical correlation analysis
- It is common practice in statistical data analysis to perform dimension reduction, as modern data sets grow larger and more complex. Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are two of the most popular methods for dimension reduction. Despite the popularity of these methods, there is no widely adopted standard approach to select the proper dimension of the subspace to be obtained by PCA or CCA. To address this issue, we propose a novel method utilizing the hypothesis testing framework and test whether the currently selected subspace via PCA or CCA captures all the statistically significant signals in the given data set. While existing hypothesis testing approaches do not enjoy the exact type 1 error property and lose power under some scenarios, the proposed method provides exact type 1 error controls along with decent size of power in detecting signals. Central to our work is the post-selection inference framework which facilitates valid inference after data-driven model selection; the proposed hypothesis testing method provides exact type 1 error controls by conditioning on the selection event which leads to the inference.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Department of Statistics.
|Statement of responsibility
|Submitted to the Department of Statistics.
|Thesis (Ph.D.)--Stanford University, 2016.
- © 2016 by Yunjin Choi
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...