Scaling single cell genomics analysis to millions of cells

Placeholder Show Content

Abstract/Contents

Abstract
Improved experimental methods in single cell genomics have increased dataset sizes by two orders of magnitude in the last five years, such that software scalability is quickly becoming a key bottleneck in our ability to analyze and understand multi-million cell atlases. Most analysis methods load full datasets in memory, resulting in excessive memory usage that scales one-to-one with dataset size. Furthermore, existing compressed storage formats for single cell datasets are so slow to read that practical analysis must be performed on uncompressed data. This work describes BPCells, a software package for scalable analysis of massive single cell RNA-seq and ATAC-seq datasets. BPCells provides lossless, seekable bitpacking compression for scATAC-seq fragment alignments and sparse single cell counts matrices. These compression formats are so fast that a single thread can decompress a dataset faster than loading an uncompressed version from a hard drive. Additionally, BPCells implements disk-backed streaming computations that can reduce memory requirements by two orders of magnitude compared to popular tools like Scanpy and Seurat, while incurring little or no speed penalty. Notably, BPCells can reproduce the results of existing software packages to within numerical precision, making it a drop-in replacement for existing tools. This work covers the design and implementation of BPCells, along with applications of single cell analysis.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Parks, Benjamin Ezra
Degree supervisor Greenleaf, William James
Degree supervisor Kundaje, Anshul, 1980-
Thesis advisor Greenleaf, William James
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Dror, Ron, 1975-
Degree committee member Dror, Ron, 1975-
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Benjamin Parks.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/tm851gy3529

Access conditions

Copyright
© 2023 by Benjamin Ezra Parks
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...