Unraveling the genetics of disease using structured probabilistic models
- Recent technological advances have allowed us to collect genomic data on an unprecedented scale, with the promise of revealing genetic variants, genes, and pathways disrupted in clinically relevant human traits. However, identifying functional variants and ultimately unraveling the genetics of complex disease from such data have presented significant challenges. With millions of genetic factors to consider, spurious associations and lack of statistical power are major hurdles. Further, we cannot easily assess functional roles even for known trait-associated variants, particularly for those that lie outside of protein-coding regions of the genome. To address these challenges in identifying the genetic factors underlying complex traits, we have developed probabilistic machine learning methods that leverage biological structure and prior knowledge. In this thesis, we describe four applications of such models. First, we present a method for reconstructing causal gene networks from interventional genetic interaction data in model organisms. Here, we are able to identify intricate functional dependencies among hundreds of genes affecting a complex trait. We have applied this method to understanding the genetics of protein folding in yeast, where we demonstrate ability to recapitulate the details, including ordering, of known pathways, and make novel functional predictions. Second, we present PriorNet, a method for incorporating gene network and path- way information into the analysis of population-level studies of genetic variation in human disease. PriorNet utilizes a flexible, Markov Random Field prior to propagate information between functionally related genes and related diseases, in order to improve statistical power in large-scale disease studies. We demonstrate a significant improvement in the discovery of disease-relevant genes in studies of three autoimmune diseases. Next, we extend the intuitions of PriorNet in a method for identifying interactions between genetic variants in human disease, to begin to understand how genes work together in complex disease processes. Our method, GAIT, leverages gene networks, network structure, and other patterns to adaptively prioritize candidate in- teractions for testing, and dramatically reduce the burden of multiple hypothesis correction to identify a large number of interactions in diverse human disease studies. Finally, we discuss the identification of functional variants on a large scale through the use of gene expression as a high-resolution cellular phenotype. We have sequenced RNA from 922 genotyped individuals to provide a direct window into the distribution, properties, and consequences of thousands of regulatory variants affecting diverse gene expression traits including splicing and allelic expression. From the identified variants, we also train a model, LRVM, for predicting regulatory consequences based on location and genomic properties of each variant.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Battle, Alexis Jane
|Stanford University, Computer Science Department.
|Statement of responsibility
|Alexis Jane Battle.
|Submitted to the Department of Computer Science.
|Thesis (Ph.D.)--Stanford University, 2013.
- © 2013 by Alexis Jane Battle
Also listed in
Loading usage metrics...