Fast approximate text document clustering using compressive sampling

Laurence A.F. Park

Conference ProceedingsOPEN ACCESS

Fast approximate text document clustering using compressive sampling

Park L

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6912 LNAI(PART 2) 565-580

DOI: 10.1007/978-3-642-23783-6_36

4Citations

12Readers

Abstract

Document clustering involves repetitive scanning of a document set, therefore as the size of the set increases, the time required for the clustering task increases and may even become impossible due to computational constraints. Compressive sampling is a feature sampling technique that allows us to perfectly reconstruct a vector from a small number of samples, provided that the vector is sparse in some known domain. In this article, we apply the theory behind compressive sampling to the document clustering problem using k-means clustering. We provide a method of computing high accuracy clusters in a fraction of the time it would have taken by directly clustering the documents. This is performed by using the Discrete Fourier Transform and the Discrete Cosine Transform. We provide empirical results showing that compressive sampling provides a 14 times increase in speed with little reduction in accuracy on 7,095 documents, and we also provide a very accurate clustering of a 231,219 document set, providing 20 times increase in speed when compared to performing k-means clustering on the document set. This shows that compressive clustering is a very useful tool that can be used to quickly compute approximate clusters. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Park, L. A. F. (2011). Fast approximate text document clustering using compressive sampling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6912 LNAI, pp. 565–580). https://doi.org/10.1007/978-3-642-23783-6_36

Fast approximate text document clustering using compressive sampling

Abstract

Cite

Register to see more suggestions