Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes

Viviana Molano; Carlos Cobos; Martha Mendoza; Enrique Herrera-Viedma; Milos Manic

Journal Article

Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8856 80-91

DOI: 10.1007/978-3-319-13647-9_9

1Citations

23Readers

Get full text

Abstract

Automatic text classification into predefined categories is an increasingly important task given the vast number of electronic documents available on the Internet and enterprise servers. Successful text classification relies heavily on the vital task of dimensionality reduction, which aims to improve classification accuracy, give greater expression to the classification process, and improve classification computational efficiency. In this paper, two algorithms for feature selection are presented, based on sampling and weighted sampling that build on the C4.5 algorithm. The results demonstrate considerable improvements with regard to classification accuracy - up to 10% - compared to traditional algorithms such as C4.5, Naïve Bayes and Support Vector Machines. The classification process is performed using the Naïve Bayes model in the space of reduced dimensionality. Experiments were carried out using data sets based on the Reuters-21578 collection.

Cite

CITATION STYLE

APA

Molano, V., Cobos, C., Mendoza, M., Herrera-Viedma, E., & Manic, M. (2014). Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8856, 80–91. https://doi.org/10.1007/978-3-319-13647-9_9

Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes

Abstract

Cite

Register to see more suggestions