Automated discovery of novel gene-trait/disease hypotheses

Placeholder Show Content

Abstract/Contents

Abstract
Advances in genome sequencing technology gives us unprecedented access to read and study the genetic material that encode every living organism. Despite continuous research efforts to fully understand the genome, our gene function knowledge is still widely incomplete. This poses many problems especially in clinical settings. It is estimated that over 7 million births each year are affected by genetic disorders worldwide, and even with genome sequencing being available and even affordable, many patients remain undiagnosed. Identifying the genetic cause is a labor intensive task especially for disorders that are caused by genes with unknown functions or pathogenicity. This dissertation describes machine learning approaches to improve the discovery rate of novel gene functions in two domains: monogenic disorder diagnosis and mouse inbred strains analysis. These methods highlight novel gene-phenotype hypotheses that are most likely true to inspire further experimental validation and ultimately expand our gene function knowledge for clinical applications. To offer potential novel diagnosis hypotheses for monogenic disease patients who cannot be diagnosed with current patient-oriented knowledgebase, I introduced InpherNet. InpherNet is a gene prioritization classifier that ranks candidate genes based on phenotypic annotations of their biological neighbors.It aims to propose novel pathogenic, disease-causing genes when previously diagnosed patient-based annotation is missing or partial. Inbred mouse strains are carefully maintained populations of mice that have gone through successive sibling mating for over 20 generations. Through this repetitive inbreeding process, each strain homogenized genetically while developing strain specific, distinctive genotypes and phenotypes. Many genetic factors have been discovered by mapping the inter-strain genotype differences against their phenotype differences. To facilitate the acceleration of novel functional discoveries using mouse inbred strains, I built AIMHIGH (Analysis of Inbred Mouse strains' High-Impact Genotype-phenotype Hypotheses). AIMHGIH uses experiments that measure phenotypic differences among different inbred strains to automatically select trait-relevant candidate genes. Any undiscovered gene-phenotype hypotheses are ranked by a literature-based discovery classifier to propose the most promising candidates. Together, these methods employ machine learning to suggest the most exciting testable hypotheses to accelerate novel gene trait discovery and improve diagnostic rate for patients with genetic disorders.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Yoo, Boyoung
Degree supervisor Bejerano, Gill, 1970-
Thesis advisor Bejerano, Gill, 1970-
Thesis advisor Bernstein, Jonathan A
Thesis advisor Kundaje, Anshul, 1980-
Degree committee member Bernstein, Jonathan A
Degree committee member Kundaje, Anshul, 1980-
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Boyoung Yoo.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/rk051yn2117

Access conditions

Copyright
© 2022 by Boyoung Yoo
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...