Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace

8Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. Here we present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space, to address this limitation. We show that CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and can score transcription factor activities in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia.

Cite

CITATION STYLE

APA

Tayyebi, Z., Pine, A. R., & Leslie, C. S. (2024). Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace. Nature Methods, 21(6), 1014–1022. https://doi.org/10.1038/s41592-024-02274-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free