Differential expression identification and false discovery rate estimation in RNA-Seq data

Li, Jun, (Statistician); Stanford University, Department of Statistics

Differential expression identification and false discovery rate estimation in RNA-Seq data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fqb044sw7183" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: RNA-Seq is becoming the primary tool for measuring genome-wide transcript expression. We discuss the identification of features (genes, isoforms, exons, etc.) that are differentially expressed in samples in different biological conditions or under different disease statuses. Besides finding the right set of significant features, we emphasize on accurately estimating the corresponding false discovery rate (FDR). RNA-Seq data take the form of counts, so models based on Gaussian distribution are generally unsuitable. Also, different sequencing experiments have very different sequencing depths, which need to be estimated accurately and then used to normalize the data. Current methods model counts by Gaussian, Poisson or negative binomial distributions, and they apply the Benjamini-Hochberg procedure for FDR estimation. They have obvious limitations: (1) They are only applicable to two-class data, but not quantitative or survival data. (2) They are sensitive to the violations of distributional assumptions and often fail completely when outliers present in the data. (3) Their estimation of FDR can often be inaccurate. To overcome these difficulties, we propose two novel methods, a parametric one and a nonparametric one. Our parametric method uses a new permutation plug-in procedure for estimating FDR, and our nonparametric method utilizes a novel resampling strategy for normalizing the count data. Both methods can be applied to different types of RNA-Seq data. The parametric method is less sensitive to violations of its distributional assumptions, and the nonparametric method is very robust even to outliers. Both of them often give reliable estimate of FDRs in the cases where other methods cannot. Although we mainly discuss the identification of differentially expressed genes in RNA-Seq data, the two methods we develop should be equally applicable to data generated by other sequencing technologies, such as DNA-Seq, ChIP-Seq, and 3SEQ.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2012
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Li, Jun, (Statistician)
Associated with	Stanford University, Department of Statistics
Primary advisor	Tibshirani, Robert
Thesis advisor	Tibshirani, Robert
Thesis advisor	Hastie, Trevor
Thesis advisor	Wong, Wing Hung
Advisor	Hastie, Trevor
Advisor	Wong, Wing Hung

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Jun Li.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2012.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...