A novel mapreduce based k-means clustering

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data clustering is inevitable in today’s era of data deluge. k-Means is a popular partition based clustering technique. However, with the increase in size and complexity of data, it is no longer suitable. There is an urgent need to shift towards parallel algorithms. We present a MapReduce based k-Means clustering, which is scalable and fault tolerant. The major advantage of our proposed work is that it dynamically determines the number of clusters, unlike k-Means where the final number of clusters has to be specified. MapReduce jobs are iteration sensitive as multiple read and write to the file system increase the cost as well as computation time. The algorithm proposed is not iterative one, it reads the data from and writes the output back to the file system once. We show that the proposed algorithm performs better than an Improved MapReduce based k-Means clustering algorithm.

Cite

CITATION STYLE

APA

Sinha, A., & Jana, P. K. (2017). A novel mapreduce based k-means clustering. In Advances in Intelligent Systems and Computing (Vol. 458, pp. 247–255). Springer Verlag. https://doi.org/10.1007/978-981-10-2035-3_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free