Distributional character clustering for Chinese text categorization

Xuezhong Zhou; Zhaohui Wu

Conference Proceedings

Distributional character clustering for Chinese text categorization

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3157 575-584

DOI: 10.1007/978-3-540-28633-2_61

3Citations

1Readers

Get full text

Abstract

A novel feature generation method-distributional character clustering for Chinese text categorization, which avoids word segmentation, is presented and experimentally evaluated. We propose a hybrid clustering criterion function and bisecting divisive clustering algorithm to improve the quality of clusters. The experimental results show that distributional character clustering is an effective dimensionality reduction method, which reduce the feature space to very low dimensionality (e.g. 500 features) while maintaining high performance. The performance is much better than information gain. Moreover, Naïve Bayes classifier with distributional character clustering has state-of-the-art performance in Chinese text classification. © Springer-Verlag Berlin Heidelberg 2004.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhou, X., & Wu, Z. (2004). Distributional character clustering for Chinese text categorization. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 575–584). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_61

Distributional character clustering for Chinese text categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions