Principal component analysis under extreme aspect ratios

Placeholder Show Content

Abstract/Contents

Abstract
Recent studies of high-dimensional principal component analysis (PCA) often assume the proportional growth asymptotic, where the sample size n and number of parameters p are comparable, with n and p tending to infinity and their ratio converging to a positive constant. Yet, many datasets---perhaps most---have very different numbers of rows and columns. This thesis considers disproportional growth, where n and p are large though their ratio tends to zero or infinity. Either disproportional limit induces novel phenomena distinct from the proportional and fixed-p limits. Theory derived here shows that the displacement of the empirical singular values and vectors from their noise-free counterparts and the associated phase transitions---well-known under proportional growth asymptotics---still occur in the disproportionate setting. They must be quantified, however, on a novel scale of measurement that adjusts with the changing aspect ratio as the matrix size increases. In this setting, the top singular vectors corresponding to the longer of the two matrix dimensions are asymptotically uncorrelated with the noise-free signal. We apply this theory to covariance estimation under the spiked model, where the population covariance is a low-rank perturbation of the identity. For each of 15 different loss functions, we exhibit in closed form new optimal shrinkage and thresholding rules; optimality takes the particularly strong form of unique asymptotic admissibility. Our optimal procedures demand extensive eigenvalue shrinkage and offer substantial performance benefits over the standard sample covariance estimator. These procedures are closely related to optimal shrinkage rules for denoising of spiked Wigner matrices. Practitioners may ask whether to apply the procedures of the proportional or disproportional asymptotic frameworks. Conveniently, we show that one can be framework-agnostic: one unified set of closed forms (depending only on the aspect ratio of the given data) offers full asymptotic optimality under either framework. Finally, we consider estimation of the principle components of large, asymmetric tensors of arbitrary order. We study matricization approaches, which construct specific matrices from such tensors and then apply spectral methods; these strategies produce extremely tall matrices that naturally fit within the disproportional growth framework. For tensor unfolding, partial tracing, and a new method---successive contraction---we identify sharp thresholds in signal-to-noise ratio above which the signal is partially recovered. Moreover, we prove that above the partial recovery threshold of unfolding, successive contraction asymptotically achieves exact signal recovery.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Feldman, Michael Jacob
Degree supervisor Donoho, David Leigh
Thesis advisor Donoho, David Leigh
Thesis advisor Johnstone, Iain
Thesis advisor Montanari, Andrea
Degree committee member Johnstone, Iain
Degree committee member Montanari, Andrea
Associated with Stanford University, School of Humanities and Sciences
Associated with Stanford University, Department of Statistics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Michael Jacob Feldman.
Note Submitted to the Department of Statistics.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/zx920vn4786

Access conditions

Copyright
© 2023 by Michael Jacob Feldman
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...