Improved K-means map reduce algorithm for big data cluster analysis

ISSN: 22783075
2Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

In the present times of big data, large volumes of broad variety data are generated at high velocities every day. These big data contain unknown valuable information. To mine and extract knowledge from these big data, fast and scalable big data analytics are required. Clustering is a remarkable data mining technique. K-means clustering for data mining is of great interest because of its simplicity. However, there are certain limitations in K-means for analyzing big data which leave scope for successive improvements. Distributed processing frameworks and algorithms are helpful to obtain performance and scalability needs of analyzing big datasets. This research work designs a parallel K-means clustering algorithm by improving standard K-means in MapReduce paradigm. The proposed work presents a method to find initial seeds of clusters instead of randomly selecting them which is a major drawback in standard K-means for clustering big data. The research minimizes MapReduce iteration dependence also. Moreover, the presented algorithm takes into consideration between cluster separation and within cluster compactness to achieve accurate clustering. Cloud computing is applied in which Amazon Elastic MapReduce 5.x is used. It distributes the job of clustering among various nodes in parallel using low cost machines. The proposed work is simulated on some real datasets from UC Irvine Machine Learning Repository. The results confirm that the research work helps achieve higher performance and outperforms classical K-means while clustering large datasets.

Cite

CITATION STYLE

APA

Agnivesh, Pandey, R., & Singh, A. (2019). Improved K-means map reduce algorithm for big data cluster analysis. International Journal of Innovative Technology and Exploring Engineering, 8(8), 1796–1802.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free