Abstract
Data analysis tools are continuously changed and impro v ed o v er time. In order to test how these changes influence the comparability between analyses, the output of different w orkflo w options of the nf-core / rnaseq pipeline were compared. Five different pipeline settings (STAR+Salmon, S TAR+R SEM, S TAR+feat ureCounts, HIS AT2+feat ureCounts, pseudoaligner Salmon) were run on three datasets (human, Arabidopsis, zebrafish) containing spike-ins of the External RNA Control Consortium (ERCC). Fold change ratios and differential expression of genes and spike-ins were used for comparative analyses of the different tools and versions settings of the pipeline. An overlap of 85% for differential gene classification between pipelines could be shown. Genes interpreted with a bias were mostly those present at lo w er concentration. Also, the number of isoforms and exons per gene were determinants. Previous pipeline versions using featureCounts showed a higher sensitivity to detect one- isof orm genes lik e ER CC. To ensure dat a comparabilit y in long-term analysis series it would be recommendable to either stay with the pipeline version the series was initialized with or to run both versions during a transition time in order to ensure that the target genes are addressed the same w a y.
Cite
CITATION STYLE
Perelo, L. W., Gabernet, G., Straub, D., & Nahnsen, S. (2024). Ho w t ool combinations in different pipeline v ersions affect the outcome in RNA-seq analysis. NAR Genomics and Bioinformatics, 6(1). https://doi.org/10.1093/nargab/lqae020
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.