Efficient results merging for parallel data clustering using MapReduce

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data clustering is partitioning data into sub-groups using a distance measure. Clustering a large data amount requires an important execution time. Several works have been proposed to overcome this problem using parallelism. One of the parallel techniques consists in partitioning data and processing each partition apart, the results obtained from each partition are merged to get the final clusters configuration. Using an inappropriate merging technique leads to an inaccurate final centroids and a middling clustering quality. In this paper, we propose two merging techniques to improve the clustering quality. In a first solution, the results are merged using the K-means algorithm, and in a second one using the genetic algorithm. The results proved the efficiency of the proposed strategies.

Cite

CITATION STYLE

APA

Bousbaci, A., & Kamel, N. (2016). Efficient results merging for parallel data clustering using MapReduce. In Advances in Intelligent Systems and Computing (Vol. 474, pp. 349–357). Springer Verlag. https://doi.org/10.1007/978-3-319-40162-1_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free