Empowering machine learning systems for drug discovery with mechanistic biological knowledge

Placeholder Show Content

Abstract/Contents

Abstract
Machine learning has the potential to solve critical tasks in drug discovery, from identifying new therapeutic uses for drugs to personalizing treatment plans for patients. However, datasets in drug discovery are often small due to the high time and labor cost of experiments, limiting the applicability of machine learning systems. Here, our key insight is to infuse machine learning systems with prior information about biology, so they can learn efficiently from small, labeled datasets. We find a way to structure diverse prior information, from molecular interactions between proteins to clinical annotations about diseases, into a heterogeneous knowledge graph. We then develop two machine learning systems that use the knowledge graph to mechanistically model biological phenomena and achieve strong performance. First, we develop the multiscale interactome, a machine learning system that uses a knowledge graph to model how drugs treat diseases across multiple scales of biology. The multiscale interactome predicts which drugs treat a disease up to 40% more effectively than the prior state-of-the-art, identifies proteins and biological functions relevant to treatment, and predicts genes that alter the treatment's efficacy and side effects. Second, we develop PLATO, a deep learning system that uses a knowledge graph to achieve strong performance on tabular datasets with orders of magnitude more features than samples (i.e. "small" labeled datasets). In PLATO, the knowledge graph is auxiliary to the tabular dataset and describes input features, like genes. PLATO uses the knowledge graph to infer the weights of a multilayer perceptron, thereby using prior information to learn efficiently from a small, labeled dataset. Across 6 datasets, PLATO outperforms the prior state-of-the-art by up to 10.19%. Ultimately, we provide a general framework to empower data-driven, machine learning systems with an extensive, mechanistic knowledge of biology.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Ruiz, Camilo Andres
Degree supervisor Leskovec, Jurij
Thesis advisor Leskovec, Jurij
Thesis advisor Altman, Russ
Thesis advisor Snyder, Michael, Ph. D.
Degree committee member Altman, Russ
Degree committee member Snyder, Michael, Ph. D.
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Bioengineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Camilo Ruiz.
Note Submitted to the Department of Bioengineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/mc795nz5480

Access conditions

Copyright
© 2023 by Camilo Andres Ruiz
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...