Predicting the functional impact of de novo mutations in human diseases and disorders

Placeholder Show Content

Abstract/Contents

Abstract
Millions of human genomes and exomes have been sequenced. Still, the clinical application of genome sequencing remains limited due to the difficulty of distinguishing disease-causing mutations from benign genetic variation. Rare diseases collectively have a high incidence in the population but identifying their genetic causes individually is statistically intractable as it requires population-level sequencing studies to achieve significance. In addition, monogenic disorders (driven by single mutations) often present a needle-in-a-haystack problem, as the causal mutations are hidden amongst a much larger number of randomly occurring phenotypically neutral mutations. Finally, the effect of certain types of mutations is more difficult to pinpoint from genomic sequence alone than others -- mutations in protein-coding sequences can be analyzed in terms of their impact on protein structure and based on the evolutionary conservation of protein sequences. In contrast, relevant non-coding regulatory mutations are very difficult to identify and understand mechanistically, a problem made only more challenging by the fact that up to 98% of the human genome is non-coding. In the domain of protein-coding mutations, we developed a novel neural network method (PrimateAI) that incorporates protein structure, evolutionary constraints, and common polymorphisms in both humans and non-human primates to predict and identify the pathogenicity of protein-coding mutations in children with developmental delay disorders (DDDs). Based on hundreds of thousands of common variants derived from population sequencing of six non-human primate species, we discovered 14 new candidate genes implicated in Developmental Delay Disorder (DDD) at genome-wide significance. To address the non-coding mutation challenge, we focused on several broad categories of rare developmental disorders -- congenital heart disorders (CHDs) and neurological developmental disorders (NDDs), including autism spectrum disorders (ASD). In the case of CHD and ASD, to reduce the search space among the vast non-coding portions of the genome, we used a combination of open chromatin profiling of fetal human hearts and brains at multiple post-conception developmental time points and interpretable base pair-resolution neural network models (BPNet) to understand the sequence drivers of chromatin accessibility. For the first time, we established genetic mutations that can be associated with gene activity in arteries for CHD, and with the radial glial progenitor cells for ASD. These computational methods, combined with cataloging common variation in additional primate species, will provide a framework for improving the interpretation of millions of variants of uncertain significance, further advancing the clinical utility of human genome sequencing. In addition to understanding the causal role of these mutations, we also characterized the cellular similarity of induced pluripotent stem cells (iPSC) derived major cardiac cell types to the human developmental cellular counterparts. This process highlighted the cellular differentiation systems that produce cell systems with high concordance with their in vivo counterparts, namely cardiomyocytes and endothelium cells. We further utilized this knowledge to perform Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) based gene editing experiments to validate the predictions from the BPNet models functionally. In summary, this thesis presents new computational approaches to understand the pathogenicity of human mutations and functionally validate them in well-characterized iPSC-derived cell systems.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Sundaram, Laksshman
Degree supervisor Greenleaf, William James
Degree supervisor Kundaje, Anshul, 1980-
Thesis advisor Greenleaf, William James
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Quertermous, Thomas
Thesis advisor Yamins, Daniel
Degree committee member Quertermous, Thomas
Degree committee member Yamins, Daniel
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Laksshman Sundaram.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/kd457gs1189

Access conditions

Copyright
© 2021 by Laksshman Sundaram
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...