A novel mapreduce based k-means clustering

Ankita Sinha; Prasanta K. Jana

Conference Proceedings

A novel mapreduce based k-means clustering

Advances in Intelligent Systems and Computing (2017) 458 247-255

DOI: 10.1007/978-981-10-2035-3_26

1Citations

2Readers

Get full text

Abstract

Data clustering is inevitable in today’s era of data deluge. k-Means is a popular partition based clustering technique. However, with the increase in size and complexity of data, it is no longer suitable. There is an urgent need to shift towards parallel algorithms. We present a MapReduce based k-Means clustering, which is scalable and fault tolerant. The major advantage of our proposed work is that it dynamically determines the number of clusters, unlike k-Means where the final number of clusters has to be specified. MapReduce jobs are iteration sensitive as multiple read and write to the file system increase the cost as well as computation time. The algorithm proposed is not iterative one, it reads the data from and writes the output back to the file system once. We show that the proposed algorithm performs better than an Improved MapReduce based k-Means clustering algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Sinha, A., & Jana, P. K. (2017). A novel mapreduce based k-means clustering. In Advances in Intelligent Systems and Computing (Vol. 458, pp. 247–255). Springer Verlag. https://doi.org/10.1007/978-981-10-2035-3_26

A novel mapreduce based k-means clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions