False discoveries with dependence : an application of objective inference
Abstract/Contents
- Abstract
- Especially lately, increased availability of data and decreased barriers for data analysis, while promising for the popularity of statistics, have lead to increased concern of personal biases and motives, causing distrust in published conclusions. We propose that inferential guarantees which hold regardless of individual methods and beliefs are a solution to building trust. This motivates defining an objective inference as one which is interpretable - no other information on the underlying data is needed to synthesize conclusions from results - and fair - interpretation of the same results across multiple observers with different preferences is equally easy. An inference which has these properties is more factual than one which does not. The properties of interpretable and fair are given a decision theoretic definitions in the context of hypothesis testing. They amount to a robustness requirement of null risk with respect to nuisance parameters and transformations of data. Two examples applying these properties are explored in depth. For the first example, we regain objectivity for false discovery analysis in the presence of dependent test statistics by deriving upper confidence bounds on the false discovery quantities: false discovery proportion (Fdp), Bayesian false discovery rate (Fdr), and local false discovery rate (fdr). These upper confidence bounds are uniform across all choices of hypothesis cutoffs, motivating plotting many cutoffs at once, or listing them in a table. Extension to theoretical vs estimated null components are included. We call these the "U" methods, e.g. UFdp. These methods use derived covariance formulas for both the empirical process and density estimates from the Expectation-Maximization (EM) algorithm when data is correlated, and approximations to tail probabilities of the supremum of Gaussian processes over parameter sets embedded in metric spaces from Volume of Tubes and Double Sum arguments. For the second example, we consider online experimentation, or sequential hypothesis testing of streaming data on the internet where practitioners have short term incentives to subvert statistical protocol. Defining an always valid p-value process that is super uniform regardless of the stopping time chosen for the experiment allows for an objective inference. We explore a particular always valid p-value based on the mixture sequential probability ratio test (mSPRT), and show it has asymptotically optimal risk, even among tests which are allowed to fix the stopping time and maximal sample size truncation in advance. We show also how to choose the tuning parameters of a mSPRT to gain good sub-asymptotic performance. Finally, we compare multiple always valid p-value processes at once through sequential false discovery analysis. Even though the p-value process may be themselves independent, the choice of mutual stopping time can introduce dependence, biasing false discovery procedures. We derive false discovery rate bounds from applying the Benjamini-Hochberg (BH) procedure at any stopping time, and examine them for three common classes of stopping times.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2016 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Pekelis, Leonid B |
---|---|
Associated with | Stanford University, Department of Statistics. |
Primary advisor | Efron, Bradley |
Thesis advisor | Efron, Bradley |
Thesis advisor | Johnstone, Iain |
Thesis advisor | Owen, Art B |
Thesis advisor | Taylor, Jonathan |
Advisor | Johnstone, Iain |
Advisor | Owen, Art B |
Advisor | Taylor, Jonathan |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Leonid B. Pekelis. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2016. |
Location | electronic resource |
Access conditions
- Copyright
- © 2016 by Leonid Boris Pekelis
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...