A sequence-to-sequence regression of genome-wide chromatin data through adversarial training

Min, Jesik; Stanford University, Department of Computer Science

A sequence-to-sequence regression of genome-wide chromatin data through adversarial training

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fbp764rj2572" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: An Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) reveals information about open chromatin regions, individual nucleosomes, and chromatin compaction at nucleotide resolution using only 500 to 50,000 cells. In contrast, Chromatin Immunoprecipitation sequencing (ChIP-seq) requires much more biological samples, typically millions of cells, to detect DNA regions with histone modifications. In this sense, regressing histone ChIP-seq data from less costly ATAC-seq data will help us map missing histone marks and understand epigenomic activity in a more efficient way. This paper investigates how our modified deep adversarial training approach can be used to predict ChIP-seq signal based on ATAC-seq signal. We begin by setting the performance of convolutional neural network (CNN) model as a baseline. We then introduce three modifications to the widely used adversarial network architecture. First, we modify the generator component of the adversarial network so that it takes ATAC-seq signal as input instead of random noise and generates ChIP-seq signal from the ATAC-seq signal. Second, we suggest composite objective function based on two different losses - mean squared error and adversarial loss. Third, we apply one-sided label smoothing, which is essential in stabilizing the adversarial training. The generator trained through our new adversarial training approach reports Pearson correlation of 0.562 with respect to the actual ChIP-seq signal, outperforming the CNN baseline. We also conduct qualitative analysis on how the adversarial training based on the composite objective function helps the model predict ChIP-seq peaks using ATAC-seq signal. To the best of our knowledge, this is the first attempt to tackle epigenomic signal imputation task using deep adversarial training.

Description

Type of resource	text
Date created	June 15, 2018

Creators/Contributors

Author	Min, Jesik
Author	Stanford University, Department of Computer Science
Primary advisor	Israeli, Johnny
Principal investigator	Kundaje, Anshul

Subjects

Subject	Stanford School of Engineering
Subject	Department of Computer Science
Subject	computational biology
Subject	deep learning
Subject	genomics
Subject	ATAC-seq
Subject	ChIP-seq
Subject	generative adversarial network
Genre	Thesis

Bibliographic information

Location	https://purl.stanford.edu/bp764rj2572

Access conditions

Use and reproduction: User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Preferred citation

Preferred Citation: Min, Jesik. (2018). A sequence-to-sequence regression of genome-wide chromatin data through adversarial training. Stanford Digital Repository. Available at: https://purl.stanford.edu/bp764rj2572

Collection

Undergraduate Theses, School of Engineering

View other items in this collection in SearchWorks

Contact information

Contact: serendipity9210@gmail.com

Also listed in

View in SearchWorks

Loading usage metrics...