With the development of high-throughput sequencing technology, it brings a large volume of data of transcriptome. Long non-protein-coding RNAs (lncRNAs) identification is pervasive in transcriptome studies in their important roles in biological process. This paper proposed a computational method for identifying lncRNAs based on machine learning. The method first selects feature using k-mer for traversing the transcript sequence to obtain a large class of features, integrated GC content and sequence length. Then it uses variance test to select three kinds of features by grid searching and reduce the data dimension and support vector machine pressure to establish a recognition model, the final model has a certain stability and robustness. The method obtain 95.7% accuracy, 0.99 AUC for test dataset. Therefore, it could be promising for identifying lncRNA.
CITATION STYLE
Li, Y., Ou, Y., Xu, Z., & Gong, L. (2019). Identifying lncRNA based on support vector machine. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11837 LNCS, pp. 68–75). Springer. https://doi.org/10.1007/978-3-030-32962-4_7
Mendeley helps you to discover research relevant for your work.