Distributional character clustering for Chinese text categorization

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A novel feature generation method-distributional character clustering for Chinese text categorization, which avoids word segmentation, is presented and experimentally evaluated. We propose a hybrid clustering criterion function and bisecting divisive clustering algorithm to improve the quality of clusters. The experimental results show that distributional character clustering is an effective dimensionality reduction method, which reduce the feature space to very low dimensionality (e.g. 500 features) while maintaining high performance. The performance is much better than information gain. Moreover, Naïve Bayes classifier with distributional character clustering has state-of-the-art performance in Chinese text classification. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Zhou, X., & Wu, Z. (2004). Distributional character clustering for Chinese text categorization. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 575–584). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_61

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free