S2FS: Single score feature selection applied to the problem of distinguishing long non-coding RNAs from protein coding transcripts

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The task of distinguishing long non-coding RNAs (lncRNAs) from protein coding transcripts (PCTs) has been previously addressed with machine learning (ML) algorithms using hundreds of features. However, the use of a large number of features can negatively affect the predictive performance of these algorithms since it can lead to problems like overfitting due to a phenomenon known as the curse of dimensionality. In order to deal with these problems, dimensionality reduction techniques have been proposed, among them, feature selection. This work proposes and experimentally evaluates a simple and fast feature selection technique, called Single Score Feature Selection - S2FS. For such, initially, frequencies of 2-mers, 3-mers and 4-mers were extracted from public databases of PCTs and lncRNAs of Homo sapiens, resulting in a dataset composed of two groups of RNA sequences, one for PCTs and the other for lncRNAs, and a large number of features. To reduce the number of features, S2FS was applied to the dataset. Experimental results showed that relevant features were selected, keeping the predictive accuracy, with a lower processing cost than some existing feature selection techniques.

Cite

CITATION STYLE

APA

Kümmel, B. C., de Carvalho, A. C. P. L. F., Brigido, M. M., Ralha, C. G., & Walter, M. E. M. T. (2018). S2FS: Single score feature selection applied to the problem of distinguishing long non-coding RNAs from protein coding transcripts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11228 LNBI, pp. 103–113). Springer Verlag. https://doi.org/10.1007/978-3-030-01722-4_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free