Learning genomic and molecular mediators of genotype-phenotype associations
Abstract/Contents
- Abstract
- The vast majority of genomic variants are non-coding, and many disrupt regulatory elements, causing dysregulation of gene expression. However, the functional mechanisms by which non-coding variants operate at the molecular level, as well as their tissue-specific downstream effects on cellular, organismal and disease phenotypes remain challenging to decipher. Firstly, complex phenotypes such as physical activity patterns are difficult to characterize and measure. Secondly, even after inferring statistical associations between genetic loci and complex phenotypes, identifying the causal variants is challenging due to the issues posed by linkage disequilibrium. Finally, the elucidation of functional molecular mechanisms that mediate the manifestation of genotypic variation to phenotypic effects remains an open challenge in the field. This thesis attempts to address these three challenges via the development and application of statistical and deep learning approaches to mine large genomic, molecular and phenotypic datasets. The MyHeart Counts study serves as an example of how wearable and mobile technologies enable unobtrusive real-time measurements of complex phenotypes such as exercise and physical activity patterns. These technologies also enable rapid recruitment of large study cohorts and facilitate fully digital randomized controlled trials with low barriers to entry. Such technologies also facilitate the compilation of population-level biobanks, such as the UK Biobank by enabling acquisition of lifestyle and activity data at scale. Having acquired complex phenotypes on large data cohorts, we can begin to investigate the effects of genomic variation on these phenotypes by performing genomewide association studies (GWAS). Functional GWAS SNPs can be identified via in silico interrogation of predictive deep learning models of regulatory DNA. Here, I present convolutional neural network models trained on genome-wide chromatin profiling experiments to interpret and finemap GWAS SNPs by leveraging their ability to learn predictive DNA sequence syntax. Case studies in colorectal cancer and Alzheimer's disease are presented to illustrate the application of these methods. To improve the model stability and interpretability, I developed deep learning models that can predict regulatory chromatin profiles at single base resolution, accounting and correcting for confounding experimental biases. I also contributed to several collaborative investigations of the molecular basis of complex cellular phenotypes. We identified the Sp1 regulatory protein as a key regulator of matrix stiffness and induction of tumorigenic phenotypes in mammary epithelium; the PI3K pathway as a key modulator of efficiency of stem cell differentiation and transcription factor networks that regulate murine muscle stem cell aging through differentiation. In summary, this thesis presents new computational approaches for linking genotype to phenotype through mechanistic molecular mechanisms.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2020; ©2020 |
Publication date | 2020; 2020 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Shcherbina, Anna |
---|---|
Degree supervisor | Ashley, Euan A |
Degree supervisor | Kundaje, Anshul, 1980- |
Thesis advisor | Ashley, Euan A |
Thesis advisor | Kundaje, Anshul, 1980- |
Thesis advisor | Altman, Russ |
Thesis advisor | Rivas, Manuel |
Degree committee member | Altman, Russ |
Degree committee member | Rivas, Manuel |
Associated with | Stanford University, Department of Biomedical Informatics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Anna Shcherbina. |
---|---|
Note | Submitted to the Department of Biological Data Science. |
Thesis | Thesis Ph.D. Stanford University 2020. |
Location | electronic resource |
Access conditions
- Copyright
- © 2020 by Anna Shcherbina
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...