Topics in robust mean estimation and applications to importance sampling

Placeholder Show Content

Abstract/Contents

Abstract
The sample mean is often used to aggregate different unbiased estimates of a real parameter, producing a final estimate that is unbiased but possibly with high variance. This thesis proposes two new robust estimators that can adaptively trade off some bias for variance, resulting in more competitive non-parametric mean estimators. The first estimator winsorizes the sample mean in a data-dependent way. The threshold level at which to winsorize is determined by a concrete version of the Balancing Principle, also known as Lepski's Method. The procedure chooses a threshold level among a pre-defined set by roughly balancing the bias and variance of the estimator when winsorized at different levels. While the assumptions of the Balancing Principle are probabilistic, in the importance sampling setting it is possible to bound such probabilities to obtain a finite-sample theorem yielding a principled way to perform winsorization with optimality guarantees. The second estimator introduces an aggregation rule that roughly interpolates between the sample mean and median, resulting in estimates with much smaller variance at the expense of bias. While the procedure is non-parametric, its squared bias is asymptotically negligible relative to the variance, similar to maximum likelihood estimators. The estimator is consistent, and concentration bounds for the its bias and L1 error are derived, as well as a fast, non-randomized approximating algorithm that enjoys similar theoretical properties. The empirical performances of the estimators are examined in real and simulated data, with a focus on importance sampling. They generally match the performance of the sample mean in low-variance settings, while exhibiting far better results in high-variance scenarios.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Najberg Orenstein, Paulo
Degree supervisor Diaconis, Persi
Thesis advisor Diaconis, Persi
Thesis advisor Chatterjee, Sourav
Thesis advisor Wong, Wing Hung
Degree committee member Chatterjee, Sourav
Degree committee member Wong, Wing Hung
Associated with Stanford University, Department of Statistics.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Paulo Najberg Orenstein.
Note Submitted to the Department of Statistics.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Paulo Najberg Orenstein
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...