Artificial intelligence methods for discovery in large biobanks

Placeholder Show Content

Abstract/Contents

Abstract
Large-scale biobanks, housing vast genetic and phenotypic data, are driving scientific discoveries across a wide range of diseases. However, harnessing the full potential of biobanks requires innovative methodologies that address challenges in phenotype recognition, genetic association studies, and multimorbidity. First, in order to define the disease cohorts accurately, we must recognize phenotypes that may not be labelled in the primary data. To address this challenge, we developed an AI-based method called POPDx (Population-based Objective Phenotyping by Deep Extrapolation) that computes disease liabilities for 12,803 ICD-10 codes and 1538 Phenotype codes for all participants in the UK Biobank. Second, the genetic data in biobanks are often used to conduct association studies in order to understand the genetic architecture of key health traits. These studies are commonly set up as case-control, but the "healthy" may evolve over time and become cases. We demonstrate that our disease liability estimates from patient phenotyping allow us to improve downstream genetic discovery by mapping disease risk to a quantitative scale that provides greater statistical power compared to the dichotomous designs. Finally, multimorbidity (the coexistence of multiple diseases in an individual) provides an opportunity to understand disorders that may share genetic or environmental risk factors. Therefore, we present ForeSITE (Forecasting Susceptibility to Illness with Transformer Embeddings), an automatic framework powered by a GPT-style architecture that models disease trajectories and can predict likely future diseases. These new capabilities, both alone and in combination, enhance the utility of large-scale biobanks for scientific discoveries. In particular, our contributions to phenotype recognition, genetic association studies, and multimorbidity modeling pave the way for improved disease understanding and personalized healthcare interventions.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Yang, Lu, (Researcher in bioengineering)
Degree supervisor Altman, Russ
Thesis advisor Altman, Russ
Thesis advisor Leskovec, Jurij
Thesis advisor Wall, Dennis Paul
Degree committee member Leskovec, Jurij
Degree committee member Wall, Dennis Paul
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Bioengineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Lu Yang.
Note Submitted to the Department of Bioengineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/jn115rp0937

Access conditions

Copyright
© 2023 by Lu Yang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...