Big Data is the term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information. Clustering is an essential tool for clustering Big Data. Multi-machine clustering technique is one of the very efficient methods used in the Big Data to mine and analyse the data for insights. K-Means partition-based clustering algorithm is one of the clustering algorithm used to cluster Big Data. One of the main disadvantage of K-Means clustering algorithms is the deficiency in randomly identifying the K number of clusters and centroids. This results in more number of iterations and increased execution times to arrive at the optimal centroid. Sorting-based K-Means clustering algorithm (SBKMA) using multi-machine technique is another method for analysing Big Data. In this method, the data is sorted first using Hadoop MapReduce and mean is taken as centroids. This paper proposes a new algorithm called as SBKMEDA: Sorting-based K-Median clustering algorithm using multi-machine technique for Big Data to sort the data and replace median with mean as centroid for better accuracy and speed in forming the cluster.
CITATION STYLE
Mahima Jane, E., & George Dharma Prakash Raj, E. (2018). SBKMEDA: Sorting-based K-median clustering algorithm using multi-machine technique for big data. In Advances in Intelligent Systems and Computing (Vol. 645, pp. 219–225). Springer Verlag. https://doi.org/10.1007/978-981-10-7200-0_19
Mendeley helps you to discover research relevant for your work.