Learning genomic and molecular mediators of genotype-phenotype associations

Placeholder Show Content

Abstract/Contents

Abstract
The vast majority of genomic variants are non-coding, and many disrupt regulatory elements, causing dysregulation of gene expression. However, the functional mechanisms by which non-coding variants operate at the molecular level, as well as their tissue-specific downstream effects on cellular, organismal and disease phenotypes remain challenging to decipher. Firstly, complex phenotypes such as physical activity patterns are difficult to characterize and measure. Secondly, even after inferring statistical associations between genetic loci and complex phenotypes, identifying the causal variants is challenging due to the issues posed by linkage disequilibrium. Finally, the elucidation of functional molecular mechanisms that mediate the manifestation of genotypic variation to phenotypic effects remains an open challenge in the field. This thesis attempts to address these three challenges via the development and application of statistical and deep learning approaches to mine large genomic, molecular and phenotypic datasets. The MyHeart Counts study serves as an example of how wearable and mobile technologies enable unobtrusive real-time measurements of complex phenotypes such as exercise and physical activity patterns. These technologies also enable rapid recruitment of large study cohorts and facilitate fully digital randomized controlled trials with low barriers to entry. Such technologies also facilitate the compilation of population-level biobanks, such as the UK Biobank by enabling acquisition of lifestyle and activity data at scale. Having acquired complex phenotypes on large data cohorts, we can begin to investigate the effects of genomic variation on these phenotypes by performing genomewide association studies (GWAS). Functional GWAS SNPs can be identified via in silico interrogation of predictive deep learning models of regulatory DNA. Here, I present convolutional neural network models trained on genome-wide chromatin profiling experiments to interpret and finemap GWAS SNPs by leveraging their ability to learn predictive DNA sequence syntax. Case studies in colorectal cancer and Alzheimer's disease are presented to illustrate the application of these methods. To improve the model stability and interpretability, I developed deep learning models that can predict regulatory chromatin profiles at single base resolution, accounting and correcting for confounding experimental biases. I also contributed to several collaborative investigations of the molecular basis of complex cellular phenotypes. We identified the Sp1 regulatory protein as a key regulator of matrix stiffness and induction of tumorigenic phenotypes in mammary epithelium; the PI3K pathway as a key modulator of efficiency of stem cell differentiation and transcription factor networks that regulate murine muscle stem cell aging through differentiation. In summary, this thesis presents new computational approaches for linking genotype to phenotype through mechanistic molecular mechanisms.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Shcherbina, Anna
Degree supervisor Ashley, Euan A
Degree supervisor Kundaje, Anshul, 1980-
Thesis advisor Ashley, Euan A
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Altman, Russ
Thesis advisor Rivas, Manuel
Degree committee member Altman, Russ
Degree committee member Rivas, Manuel
Associated with Stanford University, Department of Biomedical Informatics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Anna Shcherbina.
Note Submitted to the Department of Biological Data Science.
Thesis Thesis Ph.D. Stanford University 2020.
Location electronic resource

Access conditions

Copyright
© 2020 by Anna Shcherbina
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...