Unraveling the genetics of disease using structured probabilistic models

Placeholder Show Content

Abstract/Contents

Abstract
Recent technological advances have allowed us to collect genomic data on an unprecedented scale, with the promise of revealing genetic variants, genes, and pathways disrupted in clinically relevant human traits. However, identifying functional variants and ultimately unraveling the genetics of complex disease from such data have presented significant challenges. With millions of genetic factors to consider, spurious associations and lack of statistical power are major hurdles. Further, we cannot easily assess functional roles even for known trait-associated variants, particularly for those that lie outside of protein-coding regions of the genome. To address these challenges in identifying the genetic factors underlying complex traits, we have developed probabilistic machine learning methods that leverage biological structure and prior knowledge. In this thesis, we describe four applications of such models. First, we present a method for reconstructing causal gene networks from interventional genetic interaction data in model organisms. Here, we are able to identify intricate functional dependencies among hundreds of genes affecting a complex trait. We have applied this method to understanding the genetics of protein folding in yeast, where we demonstrate ability to recapitulate the details, including ordering, of known pathways, and make novel functional predictions. Second, we present PriorNet, a method for incorporating gene network and path- way information into the analysis of population-level studies of genetic variation in human disease. PriorNet utilizes a flexible, Markov Random Field prior to propagate information between functionally related genes and related diseases, in order to improve statistical power in large-scale disease studies. We demonstrate a significant improvement in the discovery of disease-relevant genes in studies of three autoimmune diseases. Next, we extend the intuitions of PriorNet in a method for identifying interactions between genetic variants in human disease, to begin to understand how genes work together in complex disease processes. Our method, GAIT, leverages gene networks, network structure, and other patterns to adaptively prioritize candidate in- teractions for testing, and dramatically reduce the burden of multiple hypothesis correction to identify a large number of interactions in diverse human disease studies. Finally, we discuss the identification of functional variants on a large scale through the use of gene expression as a high-resolution cellular phenotype. We have sequenced RNA from 922 genotyped individuals to provide a direct window into the distribution, properties, and consequences of thousands of regulatory variants affecting diverse gene expression traits including splicing and allelic expression. From the identified variants, we also train a model, LRVM, for predicting regulatory consequences based on location and genomic properties of each variant.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2013
Issuance monographic
Language English

Creators/Contributors

Associated with Battle, Alexis Jane
Associated with Stanford University, Computer Science Department.
Primary advisor Koller, Daphne
Thesis advisor Koller, Daphne
Thesis advisor Batzoglou, Serafim
Thesis advisor Levinson, Douglas
Advisor Batzoglou, Serafim
Advisor Levinson, Douglas

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Alexis Jane Battle.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2013.
Location electronic resource

Access conditions

Copyright
© 2013 by Alexis Jane Battle
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...