Development, evaluation, and application of methods for causal gene prioritization in polygenic disease
Abstract/Contents
- Abstract
- Common but complex diseases, including type 2 diabetes, Alzheimer's disease, and others, account for a vast proportion of global disease morbidity and mortality and have a substantial genetic risk component. However, determining the biological mechanisms for this risk remains an important unsolved task in the field of human genetics, since most (> 90%) of disease-associated genetic variants lie outside of protein-coding regions of the genome, and therefore are suspected to act via non-coding mechanisms such as altering gene regulation. One approach to interpreting such GWAS results has been to integrate them with functional genomics data, including molecular quantitative trait locus (QTL) studies, which measure the effects of GWAS variants on molecular traits such as gene expression and alternative splicing. A wide variety of computational methods have been invented to integrate GWAS and QTL data to prioritize likely causal genes affecting disease. Yet these methods frequently disagree about the most likely causal gene at a locus, and avenues are limited for more intuitively understanding the alleged links between specific variants and phenotypes. Here, I attempt to mitigate these shortcomings. First, I download, reformat, and aggregate GWAS summary statistics and provide them for public download, visualization, and comparison with QTL data in a web application and GWAS database named LocusCompare. Next, I apply and compare several different gene prioritization methods using both actual GWAS and simulated GWAS data, and I quantitatively benchmark their performance on data following standard assumptions. I combine all these methods in an ensemble method, HYDRA, that achieves superior performance on simulated data, with an intuitive Snakemake implementation to facilitate rapid and easy use by other researchers. Finally, in an application to type 2 diabetes, I show that gene prioritization methods producing a ranked list of likely causal genes should be only a starting point for further investigation, and I demonstrate the advantages of integrated prioritization using multiple related GWAS traits in multiple disease-relevant tissues. These approaches are easily adaptable and thus broadly applicable for identifying causal genes in a variety of polygenic diseases.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Gloudemans, Michael Joseph |
---|---|
Degree supervisor | Montgomery, Stephen, 1979- |
Thesis advisor | Montgomery, Stephen, 1979- |
Thesis advisor | Knowles, Joshua |
Thesis advisor | Owen, Art B |
Thesis advisor | Rivas, Manuel (Manuel A.) |
Degree committee member | Knowles, Joshua |
Degree committee member | Owen, Art B |
Degree committee member | Rivas, Manuel (Manuel A.) |
Associated with | Stanford University, Program in Biomedical Informatics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Michael Joseph Gloudemans. |
---|---|
Note | Submitted to the Program in Biomedical Informatics. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/wp071hp2456 |
Access conditions
- Copyright
- © 2021 by Michael Joseph Gloudemans
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...