Development, evaluation, and application of methods for causal gene prioritization in polygenic disease

Gloudemans, Michael Joseph

Development, evaluation, and application of methods for causal gene prioritization in polygenic disease

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fwp071hp2456" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Common but complex diseases, including type 2 diabetes, Alzheimer's disease, and others, account for a vast proportion of global disease morbidity and mortality and have a substantial genetic risk component. However, determining the biological mechanisms for this risk remains an important unsolved task in the field of human genetics, since most (> 90%) of disease-associated genetic variants lie outside of protein-coding regions of the genome, and therefore are suspected to act via non-coding mechanisms such as altering gene regulation. One approach to interpreting such GWAS results has been to integrate them with functional genomics data, including molecular quantitative trait locus (QTL) studies, which measure the effects of GWAS variants on molecular traits such as gene expression and alternative splicing. A wide variety of computational methods have been invented to integrate GWAS and QTL data to prioritize likely causal genes affecting disease. Yet these methods frequently disagree about the most likely causal gene at a locus, and avenues are limited for more intuitively understanding the alleged links between specific variants and phenotypes. Here, I attempt to mitigate these shortcomings. First, I download, reformat, and aggregate GWAS summary statistics and provide them for public download, visualization, and comparison with QTL data in a web application and GWAS database named LocusCompare. Next, I apply and compare several different gene prioritization methods using both actual GWAS and simulated GWAS data, and I quantitatively benchmark their performance on data following standard assumptions. I combine all these methods in an ensemble method, HYDRA, that achieves superior performance on simulated data, with an intuitive Snakemake implementation to facilitate rapid and easy use by other researchers. Finally, in an application to type 2 diabetes, I show that gene prioritization methods producing a ranked list of likely causal genes should be only a starting point for further investigation, and I demonstrate the advantages of integrated prioritization using multiple related GWAS traits in multiple disease-relevant tissues. These approaches are easily adaptable and thus broadly applicable for identifying causal genes in a variety of polygenic diseases.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Gloudemans, Michael Joseph
Degree supervisor	Montgomery, Stephen, 1979-
Thesis advisor	Montgomery, Stephen, 1979-
Thesis advisor	Knowles, Joshua
Thesis advisor	Owen, Art B
Thesis advisor	Rivas, Manuel (Manuel A.)
Degree committee member	Knowles, Joshua
Degree committee member	Owen, Art B
Degree committee member	Rivas, Manuel (Manuel A.)
Associated with	Stanford University, Program in Biomedical Informatics

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Michael Joseph Gloudemans.
Note	Submitted to the Program in Biomedical Informatics.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/wp071hp2456

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...