Data clustering is partitioning data into sub-groups using a distance measure. Clustering a large data amount requires an important execution time. Several works have been proposed to overcome this problem using parallelism. One of the parallel techniques consists in partitioning data and processing each partition apart, the results obtained from each partition are merged to get the final clusters configuration. Using an inappropriate merging technique leads to an inaccurate final centroids and a middling clustering quality. In this paper, we propose two merging techniques to improve the clustering quality. In a first solution, the results are merged using the K-means algorithm, and in a second one using the genetic algorithm. The results proved the efficiency of the proposed strategies.
CITATION STYLE
Bousbaci, A., & Kamel, N. (2016). Efficient results merging for parallel data clustering using MapReduce. In Advances in Intelligent Systems and Computing (Vol. 474, pp. 349–357). Springer Verlag. https://doi.org/10.1007/978-3-319-40162-1_38
Mendeley helps you to discover research relevant for your work.