Adaptable biophysically-interpretable neural networks in genomics and biomedicine

Mohamed, Amr Mohamed Sayed Ahmed

Adaptable biophysically-interpretable neural networks in genomics and biomedicine

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fms554gg5621" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Since the first assembly of the human genome in 2003, artificial intelligence and biology have both been improving at an astonishing rate, often synergistically. High-throughput technologies have created an opportunity to re-envision biology and medicine using novel computational techniques applied to vast datasets. Already, deep learning is beginning to impact biological research due to its ability to learn arbitrarily complex relationships from large-scale data. However, these advances create new challenges: How do we resolve and incorporate model predictions within existing knowledge and paradigms? How do we enable the use of neural network models as in silico oracles to assess hypotheses and guide experiments? How do we safely deploy deep learning systems and establish trust with researchers and practitioners, who require guarantees and a rationale for decision making? This thesis attempts to address these questions in two parts. In the first part, we focus on deep learning models of transcription factor (TF) binding, which have had striking successes modeling in vivo binding at nucleotide resolution. We present AffinityDistillation which leverages neural network models to perform novel in silico marginalization experiments at large scale to extract thermodynamic affinities of TF-DNA interactions, thereby generating quantitative predictions that can be tested in follow-up in vitro experiments. In addition to providing biophysical interpretations of neural network predictions, AffinityDistillation enables the use of neural network models as in silico biophysical oracles to assess how/whether certain in vitro phenomena manifest themselves in vivo. The second part of this thesis is focused on the safe deployment of deep learning systems to ensure they are adaptable to distributional shifts, particularly label shift. Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in biomedical settings, where a classifier trained to predict disease given symptoms must be adapted to scenarios where the baseline prevalence of the disease is different. Here we (1) show that combining maximum likelihood with a type of calibration called bias-corrected calibration outperforms previous methods across diverse datasets and distribution shifts, (2) prove that the maximum likelihood objective is concave, and (3) introduce a principled strategy for estimating source-domain priors that improves robustness to poor calibration. Furthermore, by using calibrated probabilities as a proxy for the true class labels, we can estimate the change in any arbitrary metric due to abstentions. Leveraging this, we present a general framework for abstention that can be applied to optimize any metric of interest, that is adaptable to label shift at test time, and that works out-of-the-box with any classifier that can be calibrated. Altogether, the computational approaches developed in this thesis can be of some use in the endeavor to understand the genome and better human health.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Mohamed, Amr Mohamed Sayed Ahmed
Degree supervisor	Fordyce, Polly
Degree supervisor	Kundaje, Anshul, 1980-
Thesis advisor	Fordyce, Polly
Thesis advisor	Kundaje, Anshul, 1980-
Thesis advisor	Horowitz, Mark (Mark Alan)
Degree committee member	Horowitz, Mark (Mark Alan)
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Amr Mohamed.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2024.
Location	https://purl.stanford.edu/ms554gg5621

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...