Development, evaluation, and application of methods for causal gene prioritization in polygenic disease

Placeholder Show Content

Abstract/Contents

Abstract
Common but complex diseases, including type 2 diabetes, Alzheimer's disease, and others, account for a vast proportion of global disease morbidity and mortality and have a substantial genetic risk component. However, determining the biological mechanisms for this risk remains an important unsolved task in the field of human genetics, since most (> 90%) of disease-associated genetic variants lie outside of protein-coding regions of the genome, and therefore are suspected to act via non-coding mechanisms such as altering gene regulation. One approach to interpreting such GWAS results has been to integrate them with functional genomics data, including molecular quantitative trait locus (QTL) studies, which measure the effects of GWAS variants on molecular traits such as gene expression and alternative splicing. A wide variety of computational methods have been invented to integrate GWAS and QTL data to prioritize likely causal genes affecting disease. Yet these methods frequently disagree about the most likely causal gene at a locus, and avenues are limited for more intuitively understanding the alleged links between specific variants and phenotypes. Here, I attempt to mitigate these shortcomings. First, I download, reformat, and aggregate GWAS summary statistics and provide them for public download, visualization, and comparison with QTL data in a web application and GWAS database named LocusCompare. Next, I apply and compare several different gene prioritization methods using both actual GWAS and simulated GWAS data, and I quantitatively benchmark their performance on data following standard assumptions. I combine all these methods in an ensemble method, HYDRA, that achieves superior performance on simulated data, with an intuitive Snakemake implementation to facilitate rapid and easy use by other researchers. Finally, in an application to type 2 diabetes, I show that gene prioritization methods producing a ranked list of likely causal genes should be only a starting point for further investigation, and I demonstrate the advantages of integrated prioritization using multiple related GWAS traits in multiple disease-relevant tissues. These approaches are easily adaptable and thus broadly applicable for identifying causal genes in a variety of polygenic diseases.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Gloudemans, Michael Joseph
Degree supervisor Montgomery, Stephen, 1979-
Thesis advisor Montgomery, Stephen, 1979-
Thesis advisor Knowles, Joshua
Thesis advisor Owen, Art B
Thesis advisor Rivas, Manuel (Manuel A.)
Degree committee member Knowles, Joshua
Degree committee member Owen, Art B
Degree committee member Rivas, Manuel (Manuel A.)
Associated with Stanford University, Program in Biomedical Informatics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Michael Joseph Gloudemans.
Note Submitted to the Program in Biomedical Informatics.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/wp071hp2456

Access conditions

Copyright
© 2021 by Michael Joseph Gloudemans
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...