Integrative approaches to gene-gene, gene-disease and gene-environment mapping

Placeholder Show Content


Three of the most pressing questions in human disease genetics are: 1) which genes cause disease, 2) how genes relate to each other to form biological pathways, and 3) how the environment modulates genetic disease risk. This thesis describes contributions to methods addressing each of these three questions. Regarding the problem of identifying causal genes for complex traits, this thesis explores properties of transcriptome-wide association studies (TWAS), a class of methods which intersect GWAS and expression quantitative trait locus (eQTL) datasets to find gene-trait associations, using simulations and case studies of literature-curated candidate causal genes for schizophrenia, LDL cholesterol and Crohn's disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene, as well as loci where TWAS prioritizes multiple genes, some of which are unlikely to be causal, because they share the same variants as eQTLs. We illustrate that TWAS is especially prone to spurious prioritization when using expression data from tissues or cell types that are less related to the trait, due to substantial variation in both expression levels and eQTL strengths across cell types. Nonetheless, TWAS prioritizes candidate causal genes at GWAS loci more accurately than simple baselines based on proximity to lead GWAS variant and expression in trait-related tissues. We discuss current strategies and future opportunities for improving the performance of TWAS for causal gene prioritization. Our results showcase the strengths and limitations of using expression variation across individuals to determine causal genes at GWAS loci and provide guidelines and best practices when using TWAS to prioritize candidate causal genes. Regarding the problem of understanding environmental modulation of complex traits, this thesis describes an application of Mendelian Randomization (MR) to 328,459 individuals in the UK Biobank cohort to interrogate the relationship between BMI and diabetes risk across diverse strata of body mass index (BMI), diabetes family history, and genome-wide polygenic risk scores. Though lifestyle interventions to reduce BMI are critical public health strategies for type 2 diabetes prevention, and weight loss interventions have shown demonstrable benefit for high-risk individuals, it is unclear whether the same benefits apply to those at lower risk. We found that diabetes prevalence increased sharply with BMI, family history of diabetes, and genetic risk, and increased marginally with BMI-adjusted genetic risk. However, genetic risk scores were much less predictive of diabetes status than family history, particularly after correcting for BMI. Conversely, predicted risk reduction from weight loss was strikingly similar across BMI and genetic risk categories. Weight loss was predicted to substantially reduce diabetes risk even among lower-risk individuals: a 1 kg/m2 BMI reduction was associated with a 1.31-fold reduction (95% confidence interval [CI], 1.25-1.38) in diabetes odds among individuals without a family history of diabetes, a 1.26-fold reduction (95% CI, 1.18-1.35) among individuals at low genetic risk, and a 1.28-fold reduction (95% CI, 1.19-1.37) among individuals at low BMI-adjusted genetic risk, all nearly identical to the full cohort (1.29-fold reduction, 95% CI, 1.25-1.35). In fact, individuals without family history were predicted to have even greater risk reduction than individuals with family history (1.31-fold vs 1.19-fold reduction, p = 0.02). Overall, we found that lower BMI is consistently associated with reduced diabetes risk across BMI, family history and genetic risk categories, suggesting all individuals can substantially reduce their diabetes risk through weight loss. Our results support the broad deployment of weight-loss interventions to individuals at all levels of diabetes risk. Regarding the problem of how genes relate to each other, this thesis describes the development of an approach to map co-essentiality, the tendency of genes with similar functions to have correlated knockout fitness profiles across cell lines, using generalized least squares (GLS). Our approach is well-powered, flexible and statistically well-calibrated, and avoids the pervasive false positives of previous approaches by appropriately accounting for the relatedness among cell lines. Applying the method to a compendium of 485 genome-wide CRISPR/Cas9 essentiality screens substantially improves recapitulation of known protein complexes and pathway interactions relative to prior approaches. Our methodological improvements enable unbiased genome-wide clustering based on co-essentiality profiles; we recover pathways and complexes as diverse as MAPK/ERK, PI3K/ACT/mTOR, the ribosome, the peroxisome, regulators of the DNA damage response and chromatin remodeling. These clusters also nominate roles for uncharacterized and poorly characterized genes in known pathways. Our genome-wide pathway map is a valuable resource for biological hypothesis generation.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English


Author Wainberg, Michael
Degree supervisor Bassik, Michael
Degree supervisor Kundaje, Anshul, 1980-
Thesis advisor Bassik, Michael
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Dror, Ron, 1975-
Thesis advisor Rivas, Manuel
Degree committee member Dror, Ron, 1975-
Degree committee member Rivas, Manuel
Associated with Stanford University, Computer Science Department.


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Michael Wainberg.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

© 2019 by Michael Wainberg
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...