Automatic category structure generation and categorization of Chinese text documents

3Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recently knowledge discovery and data mining in unstruc- tured or semi-structured texts(text mining) has been attracted lots of attention from both commercial and research fields. One aspect of text mining is on automatic text categorization, which assigns a text docu- ment to some predefined category according to the correlation between the document and the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human. The determination of categories and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate categories and reveal the hierarchical structure among them. We also used the ge- nerated structure to categorize text documents. The document collection is trained by a self-organizing map to form two feature maps. We then analyzed the two maps to obtain the categories and the structure among them. Although the corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.

Cite

CITATION STYLE

APA

Yang, H. C., & Lee, C. H. (2000). Automatic category structure generation and categorization of Chinese text documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1910, pp. 673–678). Springer Verlag. https://doi.org/10.1007/3-540-45372-5_83

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free