The clustering is an essential technique of data analysis that extracts distribution patterns or similar groups within data. Because of the crucial role of clustering in many scientific applications, numerous research is concerned with developing new algorithms for big data clustering. Despite this fact, the clustering remains a challenge in big data as the size and variety of datasets are rapidly increasing in the real-world. Recently, several clustering algorithms have been proposed to handle large datasets using MapReduce framework. This paper provides an overview of the clustering algorithms using MapReduce, it introduces a categorization of these algorithms based on the clustering technique and discusses their strengths and limitations. Finally, the paper discusses the main issues of each clustering approach in MapReduce framework to serve as a step for future enhancements.
CITATION STYLE
Khader, M. S., & Al-Naymat, G. (2021). Big Data Clustering Using MapReduce Framework: A Review. In Advances in Intelligent Systems and Computing (Vol. 1251 AISC, pp. 575–593). Springer. https://doi.org/10.1007/978-3-030-55187-2_42
Mendeley helps you to discover research relevant for your work.