An improved SVM-T-RFE based on intensity-dependent normalization for feature selection in gene expression of big-data

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Thanks to Next-Generation-Sequencing (NGS) revolutionary, high-throughput RNA sequencing data (RNA-seq) has become a highly sensitive and accurate method of measuring gene expression. Since RNA-seq generate a huge amount of data they have been struggling to overcome the lack of computational methods to exploit the enormous RNA-seq Big-Data. In most of cases, those methods have not been adequate for feature scaling scheme on RNA-seq Big-Data. So, RNA-seq encourages computational biologist to identify both novel and well-known features, although it have led to an increase in an adoption of previous methods and development of newly scalable data analysis ones. And it provides recognition of some deep learning methods which are scalable and adaptable for assuming and selecting the highly correlated genes for classification and prediction. However, some assumption of those methods have not been always correct and they have been considered unstable in terms of large-scale gene expression profiling. Therefore we propose improved feature selection technique of well-known support vector machine recursive feature elimination (SVM-RFE) with T-Statistics based on Intensity-dependent normalization, which uses log differential expression ratio (M vs A plot) for improving scalability. In each iteration of SVM-RFE, less dominated feature set with respect to relevance and redundancy is excluded from this set of features. In the proposed algorithm, the most relevant and less redundant feature is included in the final feature set, accomplishing comparable accuracy with a small subsets of Big-Data, such as NCBI-GEO. The proposed algorithm is compared with the existing one on several known data. It finds that the proposed algorithm have become convenient and quick than previous because it uses all functions in R package and have more improvement with regard to the time consuming in terms of Big-Data.

Cite

CITATION STYLE

APA

Kim, C., & Kim, H. young. (2017). An improved SVM-T-RFE based on intensity-dependent normalization for feature selection in gene expression of big-data. In Lecture Notes in Electrical Engineering (Vol. 449, pp. 44–51). Springer Verlag. https://doi.org/10.1007/978-981-10-6451-7_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free