A novel feature generation method-distributional character clustering for Chinese text categorization, which avoids word segmentation, is presented and experimentally evaluated. We propose a hybrid clustering criterion function and bisecting divisive clustering algorithm to improve the quality of clusters. The experimental results show that distributional character clustering is an effective dimensionality reduction method, which reduce the feature space to very low dimensionality (e.g. 500 features) while maintaining high performance. The performance is much better than information gain. Moreover, Naïve Bayes classifier with distributional character clustering has state-of-the-art performance in Chinese text classification. © Springer-Verlag Berlin Heidelberg 2004.
CITATION STYLE
Zhou, X., & Wu, Z. (2004). Distributional character clustering for Chinese text categorization. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 575–584). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_61
Mendeley helps you to discover research relevant for your work.