Scaling single cell genomics analysis to millions of cells
Abstract/Contents
- Abstract
- Improved experimental methods in single cell genomics have increased dataset sizes by two orders of magnitude in the last five years, such that software scalability is quickly becoming a key bottleneck in our ability to analyze and understand multi-million cell atlases. Most analysis methods load full datasets in memory, resulting in excessive memory usage that scales one-to-one with dataset size. Furthermore, existing compressed storage formats for single cell datasets are so slow to read that practical analysis must be performed on uncompressed data. This work describes BPCells, a software package for scalable analysis of massive single cell RNA-seq and ATAC-seq datasets. BPCells provides lossless, seekable bitpacking compression for scATAC-seq fragment alignments and sparse single cell counts matrices. These compression formats are so fast that a single thread can decompress a dataset faster than loading an uncompressed version from a hard drive. Additionally, BPCells implements disk-backed streaming computations that can reduce memory requirements by two orders of magnitude compared to popular tools like Scanpy and Seurat, while incurring little or no speed penalty. Notably, BPCells can reproduce the results of existing software packages to within numerical precision, making it a drop-in replacement for existing tools. This work covers the design and implementation of BPCells, along with applications of single cell analysis.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Parks, Benjamin Ezra |
---|---|
Degree supervisor | Greenleaf, William James |
Degree supervisor | Kundaje, Anshul, 1980- |
Thesis advisor | Greenleaf, William James |
Thesis advisor | Kundaje, Anshul, 1980- |
Thesis advisor | Dror, Ron, 1975- |
Degree committee member | Dror, Ron, 1975- |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Benjamin Parks. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/tm851gy3529 |
Access conditions
- Copyright
- © 2023 by Benjamin Ezra Parks
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...