A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.

Abstract

Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type–specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources– specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.

Cite

CITATION STYLE

APA

Subedi, S., Sumida, T. S., & Park, Y. P. (2024). A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Science Alliance, 7(10). https://doi.org/10.26508/lsa.202402713

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free