Change-point models on point processes and applications in genomics

Placeholder Show Content

Abstract/Contents

Abstract
In this thesis, we present advancements in some change-point problems and their applications to genomic problems that arises from massively parallel sequencing. Change-point problems are concerned with abrupt changes in the generating distribution of a stochastic process evolving over time, space, or any ordered set. This thesis focuses on a number of change-point models and inference problems on point processes. We provide a change-point model and efficient algorithms to detect change-points in relative intensity of non-homogeneous Poisson processes. A model selection approach is constructed in the spirit of classical Baysian Information Criterion, but tailored to the irregularity of change-point problems. We review an array of inference problems surrounding the change-point construct; and propose a point-wise Bayesian credible interval for the parameter of the generating distribution for exponential family. An asymptotic result on the relationship between frequentist and Baysian change-point estimator is shown. We investigate how data characteristics, such as sample size, signal strength, and change-point location, influences the inference procedures through a simulation study. On the application front, modern massively parallel sequencing generates enormous and rich data with much systematic and random noise. We provide a survey of the sequencing technologies and some statistical challenges in various steps of sequencing data analysis. A recent application of sequencing in population and tumor genomics is the profiling of genome copy number and detection of copy number variations across sample. We demonstrate in this thesis that sequencing reads can be viewed naturally as a stochastic process along the genome. Copy number variants are modeled as abrupt jumps in the read intensity function. This modeling assumption resembles the biological reality of mutations that leads to copy number change. We demonstrate the application of change-point methods on actual sequencing data. Our method is found to compare favorably against a commonly used existing method in a spike-in simulation study. We lastly discuss a direction in which our change-point methods can be extended. It is often of interest to find recurrent copy number variants among a collection of biological samples. We review existing array-based multi-sample copy number profiling methods. Estimation and model selection procedures for the multi-sample sequencing setting are derived as extensions of our two-sample methods. A key challenge is the treatment of carrier status, which is whether a sample carries the recurrent variant of question. We present two sets of methods, one based on the assumption that all samples are carriers and the other based on a known carrier set. The statistical characteristics of the two methods are compared in a number of simulation scenarios.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with Shen, Jeremy Jiaqi
Associated with Stanford University, Department of Statistics
Primary advisor Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor Zhang, Nancy R. (Nancy Ruonan)
Thesis advisor Siegmund, David, 1941-
Thesis advisor Wong, Wing Hung
Advisor Siegmund, David, 1941-
Advisor Wong, Wing Hung

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Jeremy J. Shen.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph. D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Jeremy Jiaqi Shen
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...