Rethinking single-cell RNA-Seq analysis

Zhang, Jesse Min

Rethinking single-cell RNA-Seq analysis

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjr605ws6765" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Since the Human Genome Project was completed in 2003, scientists have developed technologies for measuring the RNA content of a single cell. In the last decade, the number of individual cells profiled per study has grown exponentially to over 1,000,000 cells. In this thesis, I will discuss some of the computational and statistical challenges associated with the analysis of such large single-cell datasets. After introducing background information, the thesis covers three main works. The first work introduces a novel, interpretable framework with the biologist end user in mind. The framework also addresses the clustering subjectivity issue by justifying its results based on a rigorous definition of cell type. This allows us to cluster using feature selection to uncover multiple levels of biologically meaningful populations in the data. The second work considers a novel approach for representing single-cell RNA-Seq data. We argue that gene or transcript expression vectors, while intuitive, are not the most optimal way for representing single cell genomic profiles. Rather than counting the number of reads that comes from each transcript, which requires resolving the ambiguity associated with read multimapping, we decide to count the number of reads that comes from each transcript set. We show that these new representations are both more computationally efficient to obtain and more information-rich. The third and perhaps most interesting work first observes a post-selection inference problem in standard single-cell computational pipelines. Standard pipelines perform differential analysis after clustering on the same dataset, and this reusing of the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. In summary, we discuss multiple works for drawing key insights from single-cell RNA-Seq data: a clustering method that emphasizes interpretability of results, a representation of single cells that retains more information from read data, and a framework for correcting the selection bias from standard analysis pipelines.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Zhang, Jesse Min
Degree supervisor	Tse, David
Thesis advisor	Tse, David
Thesis advisor	Nishimura, Dwight George
Thesis advisor	Zou, James
Degree committee member	Nishimura, Dwight George
Degree committee member	Zou, James
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Jesse Min Zhang.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...