Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes

1Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic text classification into predefined categories is an increasingly important task given the vast number of electronic documents available on the Internet and enterprise servers. Successful text classification relies heavily on the vital task of dimensionality reduction, which aims to improve classification accuracy, give greater expression to the classification process, and improve classification computational efficiency. In this paper, two algorithms for feature selection are presented, based on sampling and weighted sampling that build on the C4.5 algorithm. The results demonstrate considerable improvements with regard to classification accuracy - up to 10% - compared to traditional algorithms such as C4.5, Naïve Bayes and Support Vector Machines. The classification process is performed using the Naïve Bayes model in the space of reduced dimensionality. Experiments were carried out using data sets based on the Reuters-21578 collection.

Cite

CITATION STYLE

APA

Molano, V., Cobos, C., Mendoza, M., Herrera-Viedma, E., & Manic, M. (2014). Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8856, 80–91. https://doi.org/10.1007/978-3-319-13647-9_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free