Abstract
Motivation: High-throughput RNA sequencing has become indispensable for decoding gene activities, yet the challenge of reconstructing full- length transcripts persists. Traditional single-sample assemblers frequently produce fragmented transcripts, especially in single-cell RNA-seq data. While algorithms designed for assembling multiple samples exist, they encounter various limitations. Results: We present Aletsch, a new assembler for multiple bulk or single-cell RNA-seq samples. Aletsch incorporates several algorithmic innovations, including a "bridging"system that can effectively integrate multiple samples to restore missed junctions in individual samples, and a new graph-decomposition algorithm that leverages "supporting"information across multiple samples to guide the decomposition of complex vertices. A standout feature of Aletsch is its application of a random forest model with 50 well-designed features for scoring transcripts. We demonstrate its robust adaptability across different chromosomes, datasets, and species. Our experiments, conducted on RNA-seq data from several protocols, firmly demonstrate Aletsch's significant outperformance over existing meta-assemblers. As an example, when measured with the partial area under the precision-recall curve (pAUC, constrained by precision), Aletsch surpasses the leading assemblers TransMeta by 22.9%-62.1% and PsiCLASS by 23.0%-175.5% on human datasets.
Cite
CITATION STYLE
Shi, Q., Zhang, Q., & Shao, M. (2024). Accurate assembly of multiple RNA-seq samples with Aletsch. Bioinformatics, 40, i307–i317. https://doi.org/10.1093/bioinformatics/btae215
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.