Automatic category structure generation and categorization of Chinese text documents

Hsin Chang Yang; Chung Hong Lee

Conference ProceedingsOPEN ACCESS

Automatic category structure generation and categorization of Chinese text documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1910 673-678

DOI: 10.1007/3-540-45372-5_83

4Citations

19Readers

Abstract

Recently knowledge discovery and data mining in unstruc- tured or semi-structured texts(text mining) has been attracted lots of attention from both commercial and research fields. One aspect of text mining is on automatic text categorization, which assigns a text docu- ment to some predefined category according to the correlation between the document and the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human. The determination of categories and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate categories and reveal the hierarchical structure among them. We also used the ge- nerated structure to categorize text documents. The document collection is trained by a self-organizing map to form two feature maps. We then analyzed the two maps to obtain the categories and the structure among them. Although the corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.

Cite

CITATION STYLE

APA

Yang, H. C., & Lee, C. H. (2000). Automatic category structure generation and categorization of Chinese text documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1910, pp. 673–678). Springer Verlag. https://doi.org/10.1007/3-540-45372-5_83

Automatic category structure generation and categorization of Chinese text documents

Abstract

Cite

Register to see more suggestions