On latent systemic effects in multiple hypotheses
Abstract/Contents
- Abstract
- This dissertation deals with two closely related topics of latent systemic effect in multiple hypothesis testing in addition to supplying an overview of the growing literature in the field. The first part aims at searching for associations with a primary variable among a great many candidate variables in high throughput settings. High throughput hypothesis testing can be made difficult by the presence of systemic effects and other latent variables. Dependencies can change the relative ordering of significance levels among hypotheses. We propose a two stage analysis to counter the effects of latent variables on the ranking of hypotheses. Our method, called LEAPP, statistically isolates the latent variables from the primary one. In simulations it gives better ordering of hypotheses than competing methods such as SVA and EIGENSTRAT. For an illustration, we turn to data from the AGEMAP study relating gene expression to age for 16 tissues in the mouse. LEAPP generates rankings with greater consistency across tissues than the rankings attained by the other methods. The second part studies the detection of DNA copy number variation (CNV) across samples. Experimental artifacts, such as local trends, if not carefully removed, may be misconstrued as significant recurrent regions. We develop an alternating algorithm to adjust the effects of latent variables on the detection of recurring CNVs. Our method, called CNVlatent, improves accuracy in detecting CNVs for simulated data compared to methods without adjustments for latent effects. We resort to two data sets for illustration. One is from the chromosome 9p region in 44 pediatric leukemia samples and the other is from a region on cytoband 11 on the q-arm of chromosome 22. CNVlatent successfully detects visible copy number changes and adjusts for the latent effects. There are many studies regarding segmentation of CNVs, but incorporating copy number information into association tests remains an open problem for lack of accuracy of copy number genotyping. We proposed a statistical framework for genotyping CNVs on a detected genomic region encompassing the putative CNVs in the analysis of both inherited and somatic copy number variants. To pool information across SNPs, we take into account the different response rates and noise properties of each SNP. We carry out the model calibration with an Expectation-Maximization (EM) based algorithm. Our method achieves higher estimation precision in synthetic data and generate estimators with greater consistency across a data set with replicate samples than existing methods such as CNVtools.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2011 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Sun, Yunting |
---|---|
Associated with | Stanford University, Department of Statistics |
Primary advisor | Owen, Art B |
Primary advisor | Zhang, Nancy R. (Nancy Ruonan) |
Thesis advisor | Owen, Art B |
Thesis advisor | Zhang, Nancy R. (Nancy Ruonan) |
Thesis advisor | Efron, Bradley |
Thesis advisor | Wong, Wing Hung |
Advisor | Efron, Bradley |
Advisor | Wong, Wing Hung |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Yunting Sun. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2011. |
Location | electronic resource |
Access conditions
- Copyright
- © 2011 by Yunting Sun
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...