Automatic text classification into predefined categories is an increasingly important task given the vast number of electronic documents available on the Internet and enterprise servers. Successful text classification relies heavily on the vital task of dimensionality reduction, which aims to improve classification accuracy, give greater expression to the classification process, and improve classification computational efficiency. In this paper, two algorithms for feature selection are presented, based on sampling and weighted sampling that build on the C4.5 algorithm. The results demonstrate considerable improvements with regard to classification accuracy - up to 10% - compared to traditional algorithms such as C4.5, Naïve Bayes and Support Vector Machines. The classification process is performed using the Naïve Bayes model in the space of reduced dimensionality. Experiments were carried out using data sets based on the Reuters-21578 collection.
CITATION STYLE
Molano, V., Cobos, C., Mendoza, M., Herrera-Viedma, E., & Manic, M. (2014). Feature selection based on sampling and C4.5 algorithm to improve the quality of text classification using naïve bayes. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8856, 80–91. https://doi.org/10.1007/978-3-319-13647-9_9
Mendeley helps you to discover research relevant for your work.