Multilingual document clustering, topic extraction and data transformations

Joaquim Silva; João Mexia; Carlos A. Coelho; Gabriel Lopes

Conference Proceedings

Multilingual document clustering, topic extraction and data transformations

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 2258 LNAI 74-87

DOI: 10.1007/3-540-45329-6_11

8Citations

1Readers

Get full text

Abstract

This paper describes a statistics-based approach for clustering documents and for extracting cluster topics. Relevant Expressions (REs) are extracted from corpora and used as clustering base features. These features are transformed and then by using an approach based on Principal Components Analysis, a small set of document classification features is obtained. The best number of clusters is found by ModelBased Clustering Analysis. Data transformations to approximate to normal distribution are done and results are discussed. The most important REs are extracted from each cluster and taken as cluster topics. © Springer-Verlag Berlin Heidelberg 2001.

Cite

CITATION STYLE

APA

Silva, J., Mexia, J., Coelho, C. A., & Lopes, G. (2001). Multilingual document clustering, topic extraction and data transformations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2258 LNAI, pp. 74–87). Springer Verlag. https://doi.org/10.1007/3-540-45329-6_11

Multilingual document clustering, topic extraction and data transformations

Abstract

Cite

Register to see more suggestions