A modern maximum likelihood theory for high-dimensional logistic regression

Sur, Pragya

A modern maximum likelihood theory for high-dimensional logistic regression

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjw604jq1260" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Logistic regression is arguably the most widely used and studied non-linear model in statistics. Classical maximum-likelihood theory based statistical inference is ubiquitous in this context. This theory hinges on well-known fundamental results: (1) the maximum-likelihood-estimate (MLE) is asymptotically unbiased and normally distributed, (2) its variability can be quantified via the inverse Fisher information, and (3) the log-likelihood ratio (LLR) statistic is asymptotically a Chi-Squared. This thesis uncovers that in the common modern setting where the number of features and the sample size are both large and comparable, classical results are far from accurate. In fact, (1) the MLE is biased, (2) its variability is far greater than classical results, and (3) the LLR statistic is not distributed as a Chi-Square. Consequently, p-values obtained based on classical theory are completely invalid in such settings. This thesis provides a modern perspective on classical maximum likelihood theory in the context of logistic regression (developed jointly by the author and her collaborators). The contributions here are two-fold: first, it discovers a phase transition in the existence of the MLE and explicitly pins down the phase transition curve; second, in the regime where the MLE is finite, it characterizes the asymptotic behavior of the MLE and the LLR for a class of covariate distributions, under the aforementioned high-dimensional regime. Empirical evidence demonstrates that this asymptotic theory provides accurate inference in finite samples and is robust to certain violations of the underlying assumptions. Practical implementation of these results necessitates the estimation of a single scalar, the overall signal strength---a procedure for estimating this parameter is also discussed. This asymptotic theory can be extended to characterize distributions of penalized maximum likelihood estimators in some settings. Along the way, this thesis surveys relevant works in the field of high-dimensional inference, particularly those developing methodology for valid inference in high-dimensional regression problems.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Sur, Pragya
Degree supervisor	Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor	Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor	Johnstone, Iain
Thesis advisor	Montanari, Andrea
Degree committee member	Johnstone, Iain
Degree committee member	Montanari, Andrea
Associated with	Stanford University, Department of Statistics.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Pragya Sur.
Note	Submitted to the Department of Statistics.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...