Automated discovery of novel gene-trait/disease hypotheses

Yoo, Boyoung

Automated discovery of novel gene-trait/disease hypotheses

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Frk051yn2117" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Advances in genome sequencing technology gives us unprecedented access to read and study the genetic material that encode every living organism. Despite continuous research efforts to fully understand the genome, our gene function knowledge is still widely incomplete. This poses many problems especially in clinical settings. It is estimated that over 7 million births each year are affected by genetic disorders worldwide, and even with genome sequencing being available and even affordable, many patients remain undiagnosed. Identifying the genetic cause is a labor intensive task especially for disorders that are caused by genes with unknown functions or pathogenicity. This dissertation describes machine learning approaches to improve the discovery rate of novel gene functions in two domains: monogenic disorder diagnosis and mouse inbred strains analysis. These methods highlight novel gene-phenotype hypotheses that are most likely true to inspire further experimental validation and ultimately expand our gene function knowledge for clinical applications. To offer potential novel diagnosis hypotheses for monogenic disease patients who cannot be diagnosed with current patient-oriented knowledgebase, I introduced InpherNet. InpherNet is a gene prioritization classifier that ranks candidate genes based on phenotypic annotations of their biological neighbors.It aims to propose novel pathogenic, disease-causing genes when previously diagnosed patient-based annotation is missing or partial. Inbred mouse strains are carefully maintained populations of mice that have gone through successive sibling mating for over 20 generations. Through this repetitive inbreeding process, each strain homogenized genetically while developing strain specific, distinctive genotypes and phenotypes. Many genetic factors have been discovered by mapping the inter-strain genotype differences against their phenotype differences. To facilitate the acceleration of novel functional discoveries using mouse inbred strains, I built AIMHIGH (Analysis of Inbred Mouse strains' High-Impact Genotype-phenotype Hypotheses). AIMHGIH uses experiments that measure phenotypic differences among different inbred strains to automatically select trait-relevant candidate genes. Any undiscovered gene-phenotype hypotheses are ranked by a literature-based discovery classifier to propose the most promising candidates. Together, these methods employ machine learning to suggest the most exciting testable hypotheses to accelerate novel gene trait discovery and improve diagnostic rate for patients with genetic disorders.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Yoo, Boyoung
Degree supervisor	Bejerano, Gill, 1970-
Thesis advisor	Bejerano, Gill, 1970-
Thesis advisor	Bernstein, Jonathan A
Thesis advisor	Kundaje, Anshul, 1980-
Degree committee member	Bernstein, Jonathan A
Degree committee member	Kundaje, Anshul, 1980-
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Boyoung Yoo.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/rk051yn2117

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...