The use of RNA-Seq has transformed the way sequencing reads are analyzed, allowing for qualitative and quantitative studies of transcriptomes. These studies always include an important collection (usually > 40%) of unknown transcripts. In this study, we improve the capability of Full-LengtherNext, an algorithm developed in our laboratory to annotate, analyze and correct de novo transcriptomes, to detect of potentially coding sequences. Here we analyze five software implementations of coding sequence predictors and show that the use of high-quality sequences at the training stage, proper threshold selection during score interrogation and the algorithm adaptation to its input type have a profound effect on the accuracy of the prediction. TransDecoder, the best performing algorithm in our tests, was thus added to the Full-LenghterNext pipeline, significantly improving its coding prediction reliability. Moreover, these analyses served to make inferences about the quality of the sample and to extract the subset of species specific (perhaps novel) genes discovered in the transcriptome assembly. Indirectly, we also demonstrated that Full- LentherNext sequence classification is appropriate and worth taking into consideration.
CITATION STYLE
Velasco, D., Seoane, P., & Gonzalo Claros, M. (2015). Bioinformatics analyses to separate species specific mRNAs from unknown sequences in de novo assembled transcriptomes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9044, pp. 322–332). Springer Verlag. https://doi.org/10.1007/978-3-319-16480-9_32
Mendeley helps you to discover research relevant for your work.