Change-point models on point processes and applications in genomics
Abstract/Contents
- Abstract
- In this thesis, we present advancements in some change-point problems and their applications to genomic problems that arises from massively parallel sequencing. Change-point problems are concerned with abrupt changes in the generating distribution of a stochastic process evolving over time, space, or any ordered set. This thesis focuses on a number of change-point models and inference problems on point processes. We provide a change-point model and efficient algorithms to detect change-points in relative intensity of non-homogeneous Poisson processes. A model selection approach is constructed in the spirit of classical Baysian Information Criterion, but tailored to the irregularity of change-point problems. We review an array of inference problems surrounding the change-point construct; and propose a point-wise Bayesian credible interval for the parameter of the generating distribution for exponential family. An asymptotic result on the relationship between frequentist and Baysian change-point estimator is shown. We investigate how data characteristics, such as sample size, signal strength, and change-point location, influences the inference procedures through a simulation study. On the application front, modern massively parallel sequencing generates enormous and rich data with much systematic and random noise. We provide a survey of the sequencing technologies and some statistical challenges in various steps of sequencing data analysis. A recent application of sequencing in population and tumor genomics is the profiling of genome copy number and detection of copy number variations across sample. We demonstrate in this thesis that sequencing reads can be viewed naturally as a stochastic process along the genome. Copy number variants are modeled as abrupt jumps in the read intensity function. This modeling assumption resembles the biological reality of mutations that leads to copy number change. We demonstrate the application of change-point methods on actual sequencing data. Our method is found to compare favorably against a commonly used existing method in a spike-in simulation study. We lastly discuss a direction in which our change-point methods can be extended. It is often of interest to find recurrent copy number variants among a collection of biological samples. We review existing array-based multi-sample copy number profiling methods. Estimation and model selection procedures for the multi-sample sequencing setting are derived as extensions of our two-sample methods. A key challenge is the treatment of carrier status, which is whether a sample carries the recurrent variant of question. We present two sets of methods, one based on the assumption that all samples are carriers and the other based on a known carrier set. The statistical characteristics of the two methods are compared in a number of simulation scenarios.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2011 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Shen, Jeremy Jiaqi | |
---|---|---|
Associated with | Stanford University, Department of Statistics | |
Primary advisor | Zhang, Nancy R. (Nancy Ruonan) | |
Thesis advisor | Zhang, Nancy R. (Nancy Ruonan) | |
Thesis advisor | Siegmund, David, 1941- | |
Thesis advisor | Wong, Wing Hung | |
Advisor | Siegmund, David, 1941- | |
Advisor | Wong, Wing Hung |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Jeremy J. Shen. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph. D.)--Stanford University, 2011. |
Location | electronic resource |
Access conditions
- Copyright
- © 2011 by Jeremy Jiaqi Shen
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...