Understanding cell identity with single cell transcriptomics
- In my thesis work, I use single-cell whole-transcriptome sequencing to reveal new insights into cell identity: when cell types arise in development, how cell types are patterned in the adult, how splicing and transcription factors are modulated by cell identity, and the molecules that may be responsible for generating these patterns. In the first study, I sequenced neurons from the mouse striatum, a large brain region involved in Parkinsons and Huntingtons, in collaboration with Ozgun Gokce and Thomas Sudhof. I created a well-resolved classification of striatal cell type of the mouse striatum; transcriptome analysis revealed 10 differentiated distinct cell types, including neurons, astrocytes, oligodendrocytes, ependymal, immune, and vascular cells, and enabled the discovery of numerous novel marker genes. I further explored neuronal heterogeneity in the adult murine striatum by combining single-cell RNA-seq of SPNs combined with quantitative RNA in situ hybridization (ISH) using the RNAscope platform. I developed a novel computational algorithm that distinguishes discrete versus continuous cell identities in scRNA-seq data, and used it to show that SPNs in the striatum can be classified into four major discrete types with little overlap and no implied spatial relationship. I found that these discrete classes that continuously vary along multiple spatial gradients axes of expression; these gradients define anatomical location by a combinatorial mechanism. I used this information to support the description of a novel region of the striatum. Broadly, our results suggest that neuronal circuitry has a substructure at far higher resolution than is typically interrogated which is defined by the precise identity and location of a neuron. In a collaboration with Rahul Sinha and Irving Weissman, I discovered and investigated an artifact in Illumina sequencing data. Illumina-based next generation sequencing (NGS) has accelerated biomedical discovery through its ability to generate thousands of gigabases of sequencing output at low cost. In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines called exclusion amplification (ExAmp). This advance has been widely adopted for genome sequencing because greater sequencing depth can be achieved for lower cost without compromising the quality of longer reads. We show that this promising chemistry is problematic, however, when multiplexing samples. We discovered that up to 0.4-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this "spreading-of-signals" arises from low levels of free index primers present in the pool. The rate of signal spreading depending on the level of free index primers present in a library pool, and therefore, variable among experiments. In a collaboration with Tianying Su, Rahul Sinha, and Kristy Red-Horse, I investigated the development of mouse coronary arteries using scRNA-Seq and mouse genetics. I developed a statistical test that categorizes subpopulations within scRNA-Seq datasets as continuous or discrete to identify candidate developmental transitions. I analyzed the transitions between coronary progenitors and artery cells computationally and in vivo, which revealed that the progenitor cells of the mouse heart undergo a gradual conversion from vein to artery before a subset crosses a threshold to differentiate into pre-artery cells. I showed that pre-artery cells in scRNA-Seq data appear prior to blood flow, contrary to previous assumptions about how the heart develops. We showed that a venous transcription factor, COUP-TFII, blocked progression to the pre-artery state through activation of cell cycle genes. I was also interested in how transcription factors maintained cell identity. I therefore analyzed a dataset composed of more than 100,000 cells from 20 organs and tissues, produced by the Tabula Muris Consortium, to understand the transcription factor codes specifying cell identity in the mouse. One of the challenges of scRNA-Seq data is that nearly all studies are specific to a single organ, and it is challenging to compare data collected from different animals by independent labs with varying experimental techniques. To understand which TFs were most informative for specifying cell types, we used random forest machine learning to show that 136 TFs are needed to simultaneously define all cell types across all organs. I collected a compendium of transcription factor reprogramming protocols and showed that for nearly all reprogramming protocols, the TFs used also specified the targeted cell type in our data, suggesting that whole-organism scRNA-Seq data can inform novel reprogramming schemes.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Quake, Stephen Ronald
|Südhof, Thomas C
|Quake, Stephen Ronald
|Südhof, Thomas C
|Degree committee member
|Stanford University, Biophysics Program.
|Statement of responsibility
|Submitted to the Biophysics Program.
|Thesis Ph.D. Stanford University 2019.
- © 2019 by Geoffrey Stanley
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...