A sequence-to-sequence regression of genome-wide chromatin data through adversarial training

Placeholder Show Content

Abstract/Contents

Abstract
An Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) reveals information about open chromatin regions, individual nucleosomes, and chromatin compaction at nucleotide resolution using only 500 to 50,000 cells. In contrast, Chromatin Immunoprecipitation sequencing (ChIP-seq) requires much more biological samples, typically millions of cells, to detect DNA regions with histone modifications. In this sense, regressing histone ChIP-seq data from less costly ATAC-seq data will help us map missing histone marks and understand epigenomic activity in a more efficient way. This paper investigates how our modified deep adversarial training approach can be used to predict ChIP-seq signal based on ATAC-seq signal. We begin by setting the performance of convolutional neural network (CNN) model as a baseline. We then introduce three modifications to the widely used adversarial network architecture. First, we modify the generator component of the adversarial network so that it takes ATAC-seq signal as input instead of random noise and generates ChIP-seq signal from the ATAC-seq signal. Second, we suggest composite objective function based on two different losses - mean squared error and adversarial loss. Third, we apply one-sided label smoothing, which is essential in stabilizing the adversarial training. The generator trained through our new adversarial training approach reports Pearson correlation of 0.562 with respect to the actual ChIP-seq signal, outperforming the CNN baseline. We also conduct qualitative analysis on how the adversarial training based on the composite objective function helps the model predict ChIP-seq peaks using ATAC-seq signal. To the best of our knowledge, this is the first attempt to tackle epigenomic signal imputation task using deep adversarial training.

Description

Type of resource text
Date created June 15, 2018

Creators/Contributors

Author Min, Jesik
Author Stanford University, Department of Computer Science
Primary advisor Israeli, Johnny
Principal investigator Kundaje, Anshul

Subjects

Subject Stanford School of Engineering
Subject Department of Computer Science
Subject computational biology
Subject deep learning
Subject genomics
Subject ATAC-seq
Subject ChIP-seq
Subject generative adversarial network
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Preferred citation

Preferred Citation
Min, Jesik. (2018). A sequence-to-sequence regression of genome-wide chromatin data through adversarial training. Stanford Digital Repository. Available at: https://purl.stanford.edu/bp764rj2572

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...