Text categorization using distributional clustering and concept extraction

Yifan He; Minghu Jiang

Conference Proceedings

Text categorization using distributional clustering and concept extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4681 LNCS 720-729

DOI: 10.1007/978-3-540-74171-8_71

1Citations

6Readers

Get full text

Abstract

Text categorization (TC) has become one the most researched fields in NLP. In this paper, we try to solve the problem of TC through a 2-step feature selection approach. First we cluster the words that appear in the texts according to their distribution in categories. Then we extract concepts from these clusters, which are DEF terms in HowNet. The extraction is according to the word clusters instead of single words. This method maintains the generalization ability of concept extraction based TC and at the same time makes full use of the occurrences of new words that are not found in concept thesaurus. We test the performance of our feature selection method on the Sogou corpus for TC with an SVM classifier. Results of our experiments show that our method can improve the performance of TC in all categories. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

He, Y., & Jiang, M. (2007). Text categorization using distributional clustering and concept extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4681 LNCS, pp. 720–729). Springer Verlag. https://doi.org/10.1007/978-3-540-74171-8_71

Text categorization using distributional clustering and concept extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions