Efficient results merging for parallel data clustering using MapReduce

Abdelhak Bousbaci; Nadjet Kamel

Conference Proceedings

Efficient results merging for parallel data clustering using MapReduce

Advances in Intelligent Systems and Computing (2016) 474 349-357

DOI: 10.1007/978-3-319-40162-1_38

1Citations

3Readers

Get full text

Abstract

Data clustering is partitioning data into sub-groups using a distance measure. Clustering a large data amount requires an important execution time. Several works have been proposed to overcome this problem using parallelism. One of the parallel techniques consists in partitioning data and processing each partition apart, the results obtained from each partition are merged to get the final clusters configuration. Using an inappropriate merging technique leads to an inaccurate final centroids and a middling clustering quality. In this paper, we propose two merging techniques to improve the clustering quality. In a first solution, the results are merged using the K-means algorithm, and in a second one using the genetic algorithm. The results proved the efficiency of the proposed strategies.

Author supplied keywords

Cite

CITATION STYLE

APA

Bousbaci, A., & Kamel, N. (2016). Efficient results merging for parallel data clustering using MapReduce. In Advances in Intelligent Systems and Computing (Vol. 474, pp. 349–357). Springer Verlag. https://doi.org/10.1007/978-3-319-40162-1_38

Efficient results merging for parallel data clustering using MapReduce

Abstract

Author supplied keywords

Cite

Register to see more suggestions