Deciphering regulatory DNA with deep learning models and interpretation methods

Nair, Surag

Deciphering regulatory DNA with deep learning models and interpretation methods

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmz621td1032" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Accurate predictive modeling of gene regulation is crucial for a fundamental understanding of cell identity and function. High-throughput profiling of diverse biochemical and functional properties of cells has enabled powerful deep learning based DNA sequence models that predict protein-DNA binding, chromatin accessibility and histone marks across cell types with state-of-the-art accuracy. Interpretation of these DNA sequence models has revealed novel insights into the cis-regulatory code of TF binding, effects of sequence variation and repeats, and the sequence basis of chromatin accessibility. However, there is much scope for enhancing modeling strategies, model performance, and the tooling and infrastructure around model development and interpretation. Moreover, the full potential of these models for extracting biological insights from high-throughput functional profiling data has not been realized. In this thesis, I will present novel methods that advance DNA sequence models for regulatory genomics, and some applications of these DNA sequence models to glean insights into biological systems. First, I introduce ChromDragoNN, a method that enables generalization of DNA sequence models to make predictions in new cell types. Next, I describe fastISM, an algorithm to significantly speed up variant scoring for convolutional neural networks. I then present dynseq, a tool for sharing and visualization of model-derived importance scores of individual bases. I will then apply DNA sequence models to two different biological systems. First, I combine single-cell chromatin accessibility profiling with DNA sequence models to nominate regulatory DNA variants associated with eye disorders. Next, I apply DNA sequence models to the study of single-cell chromatin accessibility from a time course of human skin cells transforming into induced pluripotent stem cells over four weeks. Using DNA sequence models, I reveal mechanistic insights into reprogramming progression by linking transcription factor abundance changes to sequence logic encoded in regulatory elements. Together, this thesis advances predictive modeling and analysis of gene regulation through new methods, tools and biological applications. I hope that the work moves the field closer to realizing the full potential of DNA sequence models for understanding cell identity and function.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Nair, Surag
Degree supervisor	Kundaje, Anshul, 1980-
Thesis advisor	Kundaje, Anshul, 1980-
Thesis advisor	Engreitz, Jesse
Thesis advisor	Horowitz, Mark (Mark Alan)
Degree committee member	Engreitz, Jesse
Degree committee member	Horowitz, Mark (Mark Alan)
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Surag Nair.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/mz621td1032

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...