Relevance of contextual information in compression-based text clustering

Ana Granados; Rafael Martínez; David Camacho; Francisco De Borja Rodríguez

Conference Proceedings

Relevance of contextual information in compression-based text clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6283 LNCS 259-266

DOI: 10.1007/978-3-642-15381-5_32

2Citations

1Readers

Get full text

Abstract

In this paper we take a step towards understanding compression distances by analyzing the relevance of contextual information in compression-based text clustering. In order to do so, two kinds of word removal are explored, one that maintains part of the contextual information despite the removal, and one that does not maintain it. We show how removing words in such a way that the contextual information is maintained despite the word removal helps the compression-based text clustering and improves its accuracy, while on the contrary, removing words losing that contextual information makes the clustering results worse. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Granados, A., Martínez, R., Camacho, D., & De Borja Rodríguez, F. (2010). Relevance of contextual information in compression-based text clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6283 LNCS, pp. 259–266). https://doi.org/10.1007/978-3-642-15381-5_32

Relevance of contextual information in compression-based text clustering

Abstract

Cite

Register to see more suggestions