False discoveries with dependence : an application of objective inference

Pekelis, Leonid B; Stanford University, Department of Statistics.

False discoveries with dependence : an application of objective inference

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffp627zg5471" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Especially lately, increased availability of data and decreased barriers for data analysis, while promising for the popularity of statistics, have lead to increased concern of personal biases and motives, causing distrust in published conclusions. We propose that inferential guarantees which hold regardless of individual methods and beliefs are a solution to building trust. This motivates defining an objective inference as one which is interpretable - no other information on the underlying data is needed to synthesize conclusions from results - and fair - interpretation of the same results across multiple observers with different preferences is equally easy. An inference which has these properties is more factual than one which does not. The properties of interpretable and fair are given a decision theoretic definitions in the context of hypothesis testing. They amount to a robustness requirement of null risk with respect to nuisance parameters and transformations of data. Two examples applying these properties are explored in depth. For the first example, we regain objectivity for false discovery analysis in the presence of dependent test statistics by deriving upper confidence bounds on the false discovery quantities: false discovery proportion (Fdp), Bayesian false discovery rate (Fdr), and local false discovery rate (fdr). These upper confidence bounds are uniform across all choices of hypothesis cutoffs, motivating plotting many cutoffs at once, or listing them in a table. Extension to theoretical vs estimated null components are included. We call these the "U" methods, e.g. UFdp. These methods use derived covariance formulas for both the empirical process and density estimates from the Expectation-Maximization (EM) algorithm when data is correlated, and approximations to tail probabilities of the supremum of Gaussian processes over parameter sets embedded in metric spaces from Volume of Tubes and Double Sum arguments. For the second example, we consider online experimentation, or sequential hypothesis testing of streaming data on the internet where practitioners have short term incentives to subvert statistical protocol. Defining an always valid p-value process that is super uniform regardless of the stopping time chosen for the experiment allows for an objective inference. We explore a particular always valid p-value based on the mixture sequential probability ratio test (mSPRT), and show it has asymptotically optimal risk, even among tests which are allowed to fix the stopping time and maximal sample size truncation in advance. We show also how to choose the tuning parameters of a mSPRT to gain good sub-asymptotic performance. Finally, we compare multiple always valid p-value processes at once through sequential false discovery analysis. Even though the p-value process may be themselves independent, the choice of mutual stopping time can introduce dependence, biasing false discovery procedures. We derive false discovery rate bounds from applying the Benjamini-Hochberg (BH) procedure at any stopping time, and examine them for three common classes of stopping times.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2016
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Pekelis, Leonid B
Associated with	Stanford University, Department of Statistics.
Primary advisor	Efron, Bradley
Thesis advisor	Efron, Bradley
Thesis advisor	Johnstone, Iain
Thesis advisor	Owen, Art B
Thesis advisor	Taylor, Jonathan
Advisor	Johnstone, Iain
Advisor	Owen, Art B
Advisor	Taylor, Jonathan

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Leonid B. Pekelis.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2016.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...