Adaptable biophysically-interpretable neural networks in genomics and biomedicine

Placeholder Show Content

Abstract/Contents

Abstract
Since the first assembly of the human genome in 2003, artificial intelligence and biology have both been improving at an astonishing rate, often synergistically. High-throughput technologies have created an opportunity to re-envision biology and medicine using novel computational techniques applied to vast datasets. Already, deep learning is beginning to impact biological research due to its ability to learn arbitrarily complex relationships from large-scale data. However, these advances create new challenges: How do we resolve and incorporate model predictions within existing knowledge and paradigms? How do we enable the use of neural network models as in silico oracles to assess hypotheses and guide experiments? How do we safely deploy deep learning systems and establish trust with researchers and practitioners, who require guarantees and a rationale for decision making? This thesis attempts to address these questions in two parts. In the first part, we focus on deep learning models of transcription factor (TF) binding, which have had striking successes modeling in vivo binding at nucleotide resolution. We present AffinityDistillation which leverages neural network models to perform novel in silico marginalization experiments at large scale to extract thermodynamic affinities of TF-DNA interactions, thereby generating quantitative predictions that can be tested in follow-up in vitro experiments. In addition to providing biophysical interpretations of neural network predictions, AffinityDistillation enables the use of neural network models as in silico biophysical oracles to assess how/whether certain in vitro phenomena manifest themselves in vivo. The second part of this thesis is focused on the safe deployment of deep learning systems to ensure they are adaptable to distributional shifts, particularly label shift. Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in biomedical settings, where a classifier trained to predict disease given symptoms must be adapted to scenarios where the baseline prevalence of the disease is different. Here we (1) show that combining maximum likelihood with a type of calibration called bias-corrected calibration outperforms previous methods across diverse datasets and distribution shifts, (2) prove that the maximum likelihood objective is concave, and (3) introduce a principled strategy for estimating source-domain priors that improves robustness to poor calibration. Furthermore, by using calibrated probabilities as a proxy for the true class labels, we can estimate the change in any arbitrary metric due to abstentions. Leveraging this, we present a general framework for abstention that can be applied to optimize any metric of interest, that is adaptable to label shift at test time, and that works out-of-the-box with any classifier that can be calibrated. Altogether, the computational approaches developed in this thesis can be of some use in the endeavor to understand the genome and better human health.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Mohamed, Amr Mohamed Sayed Ahmed
Degree supervisor Fordyce, Polly
Degree supervisor Kundaje, Anshul, 1980-
Thesis advisor Fordyce, Polly
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Horowitz, Mark (Mark Alan)
Degree committee member Horowitz, Mark (Mark Alan)
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Amr Mohamed.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2024.
Location https://purl.stanford.edu/ms554gg5621

Access conditions

Copyright
© 2023 by Amr Mohamed Sayed Ahmed Mohamed
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...