Principal component analysis under extreme aspect ratios
Abstract/Contents
- Abstract
- Recent studies of high-dimensional principal component analysis (PCA) often assume the proportional growth asymptotic, where the sample size n and number of parameters p are comparable, with n and p tending to infinity and their ratio converging to a positive constant. Yet, many datasets---perhaps most---have very different numbers of rows and columns. This thesis considers disproportional growth, where n and p are large though their ratio tends to zero or infinity. Either disproportional limit induces novel phenomena distinct from the proportional and fixed-p limits. Theory derived here shows that the displacement of the empirical singular values and vectors from their noise-free counterparts and the associated phase transitions---well-known under proportional growth asymptotics---still occur in the disproportionate setting. They must be quantified, however, on a novel scale of measurement that adjusts with the changing aspect ratio as the matrix size increases. In this setting, the top singular vectors corresponding to the longer of the two matrix dimensions are asymptotically uncorrelated with the noise-free signal. We apply this theory to covariance estimation under the spiked model, where the population covariance is a low-rank perturbation of the identity. For each of 15 different loss functions, we exhibit in closed form new optimal shrinkage and thresholding rules; optimality takes the particularly strong form of unique asymptotic admissibility. Our optimal procedures demand extensive eigenvalue shrinkage and offer substantial performance benefits over the standard sample covariance estimator. These procedures are closely related to optimal shrinkage rules for denoising of spiked Wigner matrices. Practitioners may ask whether to apply the procedures of the proportional or disproportional asymptotic frameworks. Conveniently, we show that one can be framework-agnostic: one unified set of closed forms (depending only on the aspect ratio of the given data) offers full asymptotic optimality under either framework. Finally, we consider estimation of the principle components of large, asymmetric tensors of arbitrary order. We study matricization approaches, which construct specific matrices from such tensors and then apply spectral methods; these strategies produce extremely tall matrices that naturally fit within the disproportional growth framework. For tensor unfolding, partial tracing, and a new method---successive contraction---we identify sharp thresholds in signal-to-noise ratio above which the signal is partially recovered. Moreover, we prove that above the partial recovery threshold of unfolding, successive contraction asymptotically achieves exact signal recovery.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Feldman, Michael Jacob |
---|---|
Degree supervisor | Donoho, David Leigh |
Thesis advisor | Donoho, David Leigh |
Thesis advisor | Johnstone, Iain |
Thesis advisor | Montanari, Andrea |
Degree committee member | Johnstone, Iain |
Degree committee member | Montanari, Andrea |
Associated with | Stanford University, School of Humanities and Sciences |
Associated with | Stanford University, Department of Statistics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Michael Jacob Feldman. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/zx920vn4786 |
Access conditions
- Copyright
- © 2023 by Michael Jacob Feldman
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...