Combining homogeneous classifiers for centroid-based text classification

Verayuth Lertnattee; Thanaruk Theeramunkong

Conference Proceedings

Combining homogeneous classifiers for centroid-based text classification

Proceedings - IEEE Symposium on Computers and Communications (2002) 1034-1039

DOI: 10.1109/ISCC.2002.1021799

9Citations

8Readers

Get full text

Abstract

Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes. Based on the vector-space model, the performance of this classification particularly depends on the way to weight and select important terms in documents for constructing a prototype class vector for each class. In the past, it was shown that term weighting using statistical term distributions could improve classification accuracy. However, for different data sets, the best weighting systems are different. Towards this problem, we propose a method that uses homogenous centroid-based classification. The effectiveness of this approach is explored using four data sets. Two main factors are taken into account: model selection and score combination. By experiments, the results show that our system can improve the classification accuracy up to 7.5-8.5% compared to k-NN classifier, 3.7-4.0% compared with the naive Bayes classifier and 1.6-2.7% over the best single-model classification method (p<0.05). © 2002 IEEE.

Cite

CITATION STYLE

APA

Lertnattee, V., & Theeramunkong, T. (2002). Combining homogeneous classifiers for centroid-based text classification. In Proceedings - IEEE Symposium on Computers and Communications (pp. 1034–1039). https://doi.org/10.1109/ISCC.2002.1021799

Combining homogeneous classifiers for centroid-based text classification

Abstract

Cite

Register to see more suggestions