Computational tools used for genomic analyses are becoming more accurate but also increasingly sophisticated and complex. This introduces a new problem in that these pieces of software have a large number of tunable parameters that often have a large influence on the results that are reported. We quantify the impact of parameter choice on transcript assembly and take some first steps toward generating a truly automated genomic analysis pipeline by developing a method for automatically choosing input-specific parameter values for reference-based transcript assembly using the Scallop tool. By choosing parameter values for each input, the area under the receiver operator characteristic curve (AUC) when comparing assembled transcripts to a reference transcriptome is increased by an average of 28.9% over using only the default parameter choices on 1595 RNA-Seq samples in the Sequence Read Archive. This approach is general, and when applied to StringTie, it increases the AUC by an average of 13.1% on a set of 65 RNA-Seq experiments from ENCODE. Parameter advisors for both Scallop and StringTie are available on Github.
CITATION STYLE
Deblasio, D., Kim, K., & Kingsford, C. (2020). More Accurate Transcript Assembly via Parameter Advising. Journal of Computational Biology, 27(8), 1181–1189. https://doi.org/10.1089/cmb.2019.0286
Mendeley helps you to discover research relevant for your work.