In this paper we take a step towards understanding compression distances by analyzing the relevance of contextual information in compression-based text clustering. In order to do so, two kinds of word removal are explored, one that maintains part of the contextual information despite the removal, and one that does not maintain it. We show how removing words in such a way that the contextual information is maintained despite the word removal helps the compression-based text clustering and improves its accuracy, while on the contrary, removing words losing that contextual information makes the clustering results worse. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Granados, A., Martínez, R., Camacho, D., & De Borja Rodríguez, F. (2010). Relevance of contextual information in compression-based text clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6283 LNCS, pp. 259–266). https://doi.org/10.1007/978-3-642-15381-5_32
Mendeley helps you to discover research relevant for your work.