Change-point models on point processes and applications in genomics

Shen, Jeremy Jiaqi; Stanford University, Department of Statistics

Change-point models on point processes and applications in genomics

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnb982yn0211" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In this thesis, we present advancements in some change-point problems and their applications to genomic problems that arises from massively parallel sequencing. Change-point problems are concerned with abrupt changes in the generating distribution of a stochastic process evolving over time, space, or any ordered set. This thesis focuses on a number of change-point models and inference problems on point processes. We provide a change-point model and efficient algorithms to detect change-points in relative intensity of non-homogeneous Poisson processes. A model selection approach is constructed in the spirit of classical Baysian Information Criterion, but tailored to the irregularity of change-point problems. We review an array of inference problems surrounding the change-point construct; and propose a point-wise Bayesian credible interval for the parameter of the generating distribution for exponential family. An asymptotic result on the relationship between frequentist and Baysian change-point estimator is shown. We investigate how data characteristics, such as sample size, signal strength, and change-point location, influences the inference procedures through a simulation study. On the application front, modern massively parallel sequencing generates enormous and rich data with much systematic and random noise. We provide a survey of the sequencing technologies and some statistical challenges in various steps of sequencing data analysis. A recent application of sequencing in population and tumor genomics is the profiling of genome copy number and detection of copy number variations across sample. We demonstrate in this thesis that sequencing reads can be viewed naturally as a stochastic process along the genome. Copy number variants are modeled as abrupt jumps in the read intensity function. This modeling assumption resembles the biological reality of mutations that leads to copy number change. We demonstrate the application of change-point methods on actual sequencing data. Our method is found to compare favorably against a commonly used existing method in a spike-in simulation study. We lastly discuss a direction in which our change-point methods can be extended. It is often of interest to find recurrent copy number variants among a collection of biological samples. We review existing array-based multi-sample copy number profiling methods. Estimation and model selection procedures for the multi-sample sequencing setting are derived as extensions of our two-sample methods. A key challenge is the treatment of carrier status, which is whether a sample carries the recurrent variant of question. We present two sets of methods, one based on the assumption that all samples are carriers and the other based on a known carrier set. The statistical characteristics of the two methods are compared in a number of simulation scenarios.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2011
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Shen, Jeremy Jiaqi
Associated with	Stanford University, Department of Statistics
Primary advisor	Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor	Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor	Siegmund, David, 1941-
Thesis advisor	Wong, Wing Hung
Advisor	Siegmund, David, 1941-
Advisor	Wong, Wing Hung

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Jeremy J. Shen.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph. D.)--Stanford University, 2011.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...