Intelligent systems for personalized genomic medicine

Placeholder Show Content

Abstract/Contents

Abstract
This thesis explores applications of artificial intelligence in personalized genomic medicine, focusing on three central challenges: decoding raw genomic sequence, interpreting genetic mutations, and making predictions based on this data. The first part of the thesis focuses on the sequencing problem. We argue that the cost of genome resequencing can be significantly reduced by leveraging vast existing amounts of genetic data to train statistical models of the human genome. Specifically, we introduce a new sequencing technology which reduces the cost of genome haplotyping by up to ten fold by augmenting an existing wetlab protocol with statistical models. We also extend this technology to the analysis of the human gut microbiome, enabling scientists to study for the first time the fine variation among individual microbial strains. The second part of the thesis focuses on the problem of interpretation. We introduce GWASdb, a machine reading system that enables researchers and clinicians to access tens of thousands of genotype-phenotype associations in the form of a structured database automatically constructed from the biomedical literature. We find that our system discovers thousands of genetic associations that are missing in even the largest human-curated repositories. These associations can be used to predict disease risks and shed light on genome biology. More generally, our system represents the largest fully automated information extraction effort in the GWAS domain, and demonstrates the feasibility and the value of human-machine literature curation. Finally, the last part of the thesis focuses on prediction. We study two core machine learning problems that arise in biomedical applications: uncertainty estimation and semi-supervised prediction. We present techniques for estimating disease risk that do not assume that data originates from an underlying probability distribution, allowing it instead to be generated by a malicious adversary. We also introduce a new semi-supervised framework that helps researchers leverage large amounts of unlabeled data and that bridges two fundamental types of machine learning methods: discriminative and generative. Our approach extends to modern deep learning algorithms and establishes a new state-of-the-art semi-supervised accuracy on standard benchmarks. In summary, our work will help scientists better understand the human genome and turn this knowledge into personalized medical technologies.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with Kuleshov, Volodymyr
Associated with Stanford University, Computer Science Department.
Primary advisor Batzoglou, Serafim
Primary advisor Snyder, Michael, Ph. D
Thesis advisor Batzoglou, Serafim
Thesis advisor Snyder, Michael, Ph. D
Thesis advisor Kundaje, Anshul, 1980-
Advisor Kundaje, Anshul, 1980-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Volodymyr Kuleshov.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Volodymyr Kuleshov

Also listed in

Loading usage metrics...