Towards automating genetic disease diagnosis : genes, phenotypes, patient privacy

Placeholder Show Content

Abstract/Contents

Abstract
Genome sequencing is widely used in clinical practice. Individuals typically have over 4 million variants genome wide and approximately 500 variants of uncertain significance (VUS) near protein coding genes, with no clear clinical interpretation, identified through exome sequencing. Given that clinicians require dozens of hours to diagnose each patient and the estimates that 60 million individuals will be sequenced over the next 5 years patient diagnosis and genomic analysis is becoming a critical bottleneck. Developing automated and effective computational tools is essential to handle the increasing scale of patient genomes. Predicting the pathogenicity of these variants is a first step to identifying the genetic basis of a monogenic disease. Effective strategies for Mendelian disease diagnosis bring together the patient's genetic data from sequencing and phenotype data found in the electronic medical record (EMR) system to prioritize the genetic variation causing the patient's disease. In chapter 1, we provide an overview of Mendelian disease diagnosis, challenges, current approaches and an overview of the solutions we developed towards automating disease diagnosis. In chapter 2, 3 and 4 we introduce methods to improve interpretation of patient's genetic variation. Specifically in chapter 2 we introduce M-CAP, the first clinically applicable pathogenicity classifier for VUS that alter the encoded amino acid, the largest class of known pathogenic mutations. In chapter 3, we then extend this methodology to build S-CAP, the first model to predict the pathogenicity of previously ignored variants that disrupt pre-mRNA splicing mechanism, the second largest class of known pathogenic mutations. In chapter 4 we then explore a strategy to start identifying noncoding disease causing mutations from whole genome sequencing. In chapter 5 we introduce Phrank, a method to measure similarity between sets of phenotypes and prioritize those genes that best explain the patient's disease symptoms. Just as each patient has a list of phenotypes in the medical record describing signs and symptoms, genes also have associated phenotypes listed in databases such as OMIM. Incorporating phenotype information into the diagnostic pipeline greatly improves the effectiveness and interpretability of the patient's genomic data. The above methods highlight the serve or protect dilemma commonly seen when working with patient data. The tools in chapters 2-5 require patient data (both genotype and phenotype) to be shared with clinicians and between hospitals. All of these inputs are extremely sensitive. To protect patient privacy, genotypes and phenotypes should not be shared with anyone. In chapter 6 we introduce a novel set of secure cryptographic protocols to diagnose Mendelian diseases while revealing the minimal amount of genetic information. In chapter 7, we extend these strategies to securely compute the Phrank similarity operation over patient phenotype information. We conclude in chapter 8 where we summarize the novel developments in this dissertation and enumerate the next steps based on this research work. Chapter 2 was published in Nature Genetics. Chapter 3 has also just been published in Nature Genetics. Chapter 4 has been published in the European Journal of Human Genetics. Chapter 5 is published in Genetics in Medicine. Chapter 6 is published in Science and Chapter 7 is currently being submitted for publication.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Jagadeesh, Karthik Anand
Degree supervisor Bejerano, Gill, 1970-
Thesis advisor Bejerano, Gill, 1970-
Thesis advisor Bernstein, Jon
Thesis advisor Boneh, Dan, 1969-
Degree committee member Bernstein, Jon
Degree committee member Boneh, Dan, 1969-
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Karthik Anand Jagadeesh.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Karthik Anand Jagadeesh
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...