Recently knowledge discovery and data mining in unstruc- tured or semi-structured texts(text mining) has been attracted lots of attention from both commercial and research fields. One aspect of text mining is on automatic text categorization, which assigns a text docu- ment to some predefined category according to the correlation between the document and the category. Traditionally the categories are arranged in hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human. The determination of categories and their hierarchical structures were most done by human experts. In this work, we developed an approach to automatically generate categories and reveal the hierarchical structure among them. We also used the ge- nerated structure to categorize text documents. The document collection is trained by a self-organizing map to form two feature maps. We then analyzed the two maps to obtain the categories and the structure among them. Although the corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language and such documents can be transformed into a list of separated terms.
CITATION STYLE
Yang, H. C., & Lee, C. H. (2000). Automatic category structure generation and categorization of Chinese text documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1910, pp. 673–678). Springer Verlag. https://doi.org/10.1007/3-540-45372-5_83
Mendeley helps you to discover research relevant for your work.