Text clustering on latent thematic spaces: Variants, strengths and weaknesses

Xavier Sevillano; Germán Cobo; Francesc Alías; Joan Claudi Socoró

Conference Proceedings

Text clustering on latent thematic spaces: Variants, strengths and weaknesses

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4666 LNCS 794-801

DOI: 10.1007/978-3-540-74494-8_99

1Citations

10Readers

Get full text

Abstract

Deriving a thematically meaningful partition of an unlabeled text corpus is a challenging task. In comparison to classic term-based document indexing, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal indexing technique is not straightforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the optimal document indexing indeterminacy. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Sevillano, X., Cobo, G., Alías, F., & Socoró, J. C. (2007). Text clustering on latent thematic spaces: Variants, strengths and weaknesses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4666 LNCS, pp. 794–801). Springer Verlag. https://doi.org/10.1007/978-3-540-74494-8_99

Text clustering on latent thematic spaces: Variants, strengths and weaknesses

Abstract

Cite

Register to see more suggestions