Efficient permutation P-value estimation for gene set tests
Abstract/Contents
- Abstract
- In a genome-wide expression study, gene set testing is often used to find potential gene sets that correlate with a treatment(disease, drug, phenotype etc.). A gene set may contain tens to thousands genes, and genes within a gene set are generally correlated. Permutation tests are standard approaches of getting p-values for these gene set tests. Plain Monte Carlo methods that generate random permutations can be computationally infeasible for small p-values. Ackermann and Strimmer (2009) finds two families of test statistics that achieve overall best performances - a linear family and a quadratic family. This dissertation first reviews the relative background of gene set testing and permutation tests, and then provides three alternative approaches to estimate small permutation p-values efficiently. The first approach focuses on the linear statistic. Observing the p-value can be written as the proportion of points lying in a spherical cap, the p-value is approximated by the volume of a spherical cap. Error estimates can be derived from generalized Stolarsky's invariance principal, and alternative probabilistic proofs are provided. The second approach focuses on the quadratic statistic. Importance sampling is used to estimate the area of the (continuous) significant region on the sphere, and the volume of the region is used as an approximation for the (discrete proportion) p-value. Different proposal distributions are studied and compared. The third approach estimates the p-value with nested sampling. It may work for both the linear and the quadratic statistic. Similar ideas can be found in literature spanning from combinatorics, sequential Monte Carlo, Bayesian computation, rare event estimation, network reliability etc., and bears different names, e.g. approximate counting, nested sampling, subset simulation, multilevel splitting etc. We give a thorough review of literature in these different areas, and apply the technique to the gene set testing with the quadratic test statistic. Finally, we compare the proposed methods with plain Monte Carlo and saddle- point approximation on three expression studies in Parkinson's Disease patients. This work was supported by the US National Science Foundation under grant DMS-1521145.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2016 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | He, Yu |
---|---|
Associated with | Stanford University, Department of Statistics. |
Primary advisor | Owen, Art B |
Thesis advisor | Owen, Art B |
Thesis advisor | Hastie, Trevor |
Thesis advisor | Wong, Wing Hung |
Advisor | Hastie, Trevor |
Advisor | Wong, Wing Hung |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Yu He. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2016. |
Location | electronic resource |
Access conditions
- Copyright
- © 2016 by Yu He
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...