Efficient permutation P-value estimation for gene set tests

He, Yu; Stanford University, Department of Statistics.

Efficient permutation P-value estimation for gene set tests

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fhg200hk9670" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In a genome-wide expression study, gene set testing is often used to find potential gene sets that correlate with a treatment(disease, drug, phenotype etc.). A gene set may contain tens to thousands genes, and genes within a gene set are generally correlated. Permutation tests are standard approaches of getting p-values for these gene set tests. Plain Monte Carlo methods that generate random permutations can be computationally infeasible for small p-values. Ackermann and Strimmer (2009) finds two families of test statistics that achieve overall best performances - a linear family and a quadratic family. This dissertation first reviews the relative background of gene set testing and permutation tests, and then provides three alternative approaches to estimate small permutation p-values efficiently. The first approach focuses on the linear statistic. Observing the p-value can be written as the proportion of points lying in a spherical cap, the p-value is approximated by the volume of a spherical cap. Error estimates can be derived from generalized Stolarsky's invariance principal, and alternative probabilistic proofs are provided. The second approach focuses on the quadratic statistic. Importance sampling is used to estimate the area of the (continuous) significant region on the sphere, and the volume of the region is used as an approximation for the (discrete proportion) p-value. Different proposal distributions are studied and compared. The third approach estimates the p-value with nested sampling. It may work for both the linear and the quadratic statistic. Similar ideas can be found in literature spanning from combinatorics, sequential Monte Carlo, Bayesian computation, rare event estimation, network reliability etc., and bears different names, e.g. approximate counting, nested sampling, subset simulation, multilevel splitting etc. We give a thorough review of literature in these different areas, and apply the technique to the gene set testing with the quadratic test statistic. Finally, we compare the proposed methods with plain Monte Carlo and saddle- point approximation on three expression studies in Parkinson's Disease patients. This work was supported by the US National Science Foundation under grant DMS-1521145.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2016
Issuance	monographic
Language	English

Creators/Contributors

Associated with	He, Yu
Associated with	Stanford University, Department of Statistics.
Primary advisor	Owen, Art B
Thesis advisor	Owen, Art B
Thesis advisor	Hastie, Trevor
Thesis advisor	Wong, Wing Hung
Advisor	Hastie, Trevor
Advisor	Wong, Wing Hung

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Yu He.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2016.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...