Compressed K - Means for large-scale clustering

74Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.

Abstract

Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Cite

CITATION STYLE

APA

Shen, X., Liu, W., Tsang, I., Shen, F., & Sun, Q. S. (2017). Compressed K - Means for large-scale clustering. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 2527–2533). AAAI press. https://doi.org/10.1609/aaai.v31i1.10852

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free