On latent systemic effects in multiple hypotheses

Placeholder Show Content

Abstract/Contents

Abstract
This dissertation deals with two closely related topics of latent systemic effect in multiple hypothesis testing in addition to supplying an overview of the growing literature in the field. The first part aims at searching for associations with a primary variable among a great many candidate variables in high throughput settings. High throughput hypothesis testing can be made difficult by the presence of systemic effects and other latent variables. Dependencies can change the relative ordering of significance levels among hypotheses. We propose a two stage analysis to counter the effects of latent variables on the ranking of hypotheses. Our method, called LEAPP, statistically isolates the latent variables from the primary one. In simulations it gives better ordering of hypotheses than competing methods such as SVA and EIGENSTRAT. For an illustration, we turn to data from the AGEMAP study relating gene expression to age for 16 tissues in the mouse. LEAPP generates rankings with greater consistency across tissues than the rankings attained by the other methods. The second part studies the detection of DNA copy number variation (CNV) across samples. Experimental artifacts, such as local trends, if not carefully removed, may be misconstrued as significant recurrent regions. We develop an alternating algorithm to adjust the effects of latent variables on the detection of recurring CNVs. Our method, called CNVlatent, improves accuracy in detecting CNVs for simulated data compared to methods without adjustments for latent effects. We resort to two data sets for illustration. One is from the chromosome 9p region in 44 pediatric leukemia samples and the other is from a region on cytoband 11 on the q-arm of chromosome 22. CNVlatent successfully detects visible copy number changes and adjusts for the latent effects. There are many studies regarding segmentation of CNVs, but incorporating copy number information into association tests remains an open problem for lack of accuracy of copy number genotyping. We proposed a statistical framework for genotyping CNVs on a detected genomic region encompassing the putative CNVs in the analysis of both inherited and somatic copy number variants. To pool information across SNPs, we take into account the different response rates and noise properties of each SNP. We carry out the model calibration with an Expectation-Maximization (EM) based algorithm. Our method achieves higher estimation precision in synthetic data and generate estimators with greater consistency across a data set with replicate samples than existing methods such as CNVtools.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Sun, Yunting
Associated with Stanford University, Department of Statistics
Primary advisor Owen, Art B
Primary advisor Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor Owen, Art B
Thesis advisor Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor Efron, Bradley
Thesis advisor Wong, Wing Hung
Advisor Efron, Bradley
Advisor Wong, Wing Hung

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Yunting Sun.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Yunting Sun
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...