Computational algorithms and statistical models for ChIP sequencing analysis

Ma, Wenxiu; Stanford University, Computer Science Department

Computational algorithms and statistical models for ChIP sequencing analysis

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgm939jh7993" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) has been widely utilized to study genome-wide localization of protein-DNA interactions since 2007. This powerful technique provides comprehensive and high-resolution protein-DNA binding data for the identification of cis-regulatory elements, which is important for understanding and deciphering the underlying transcriptional gene regulatory mechanism. In this dissertation, I developed a computational and statistical framework for the analysis of ChIP-seq data. A typical pipeline for analyzing ChIP-seq data is presented and discussed, including data exploration and visualization, background estimation, peak detection, genomic annotation, and motif analysis. In particular, I developed and implemented a series of peak detection algorithms and methods for transcription factor (TF)-binding ChIP-seq data, for both single-replicate ChIP-seq datasets and multiple-replicate ChIP-seq datasets. For single-replicate peak calling, I developed an iterative conditional Binomial model for the two-sample problem (when both the treated ChIP sample and the negative control sample are available). This iterative method is computationally efficient and provides accurate estimation of joint background distribution between ChIP and negative control samples. Compared to other two-sample peak callers, our method produces higher sensitivity in peak calling and sharper motif resolution in detected peak regions. For the multiple-replicate problem, I put forward a hierarchical Negative Binomial model to assess binding signal variations among multiple ChIP-seq biological replicates. A closed-form empirical Bayes estimator of expected peak signal is developed by pooling information from all candidate peak regions. This empirical estimator significantly shrinks the variance of estimation error and increases the sensitivity in detecting binding loci of interest especially when the number of replicates is small. Our method outperforms existing heuristic approaches for multiple-replicate peak calling, including the pooling approach and the intersection approach. The computational algorithms and statistical models developed for ChIP-seq data analyses are applied and evaluated in a study of the Sonic Hedgehog (Shh) signaling pathway in the ventral neural tube of mouse embryos. By an integrative analysis of transcription profiling data, ChIP-seq data, function annotation and motif discovery, we identified hundreds of novel cis-regulatory elements mediated by Gli1, predicted potential co-binding partners of Gli1, and gained insights of Gli functions in mouse embryonic ventral neural tube development.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2012
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Ma, Wenxiu
Associated with	Stanford University, Computer Science Department
Primary advisor	Batzoglou, Serafim
Primary advisor	Wong, Wing Hung
Thesis advisor	Batzoglou, Serafim
Thesis advisor	Wong, Wing Hung
Thesis advisor	Dill, David L
Advisor	Dill, David L

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Wenxiu Ma.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2012.
Location	electronic resource

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...