The 21st century evolved with tsunami of data generation by the human civilization that has delivered new words like Big Data to the world of vocabulary. Digitization process has almost overtaken all the major sectors and it has played a pivotal role of dominance as for as virtual digital world is concerned. This in turn has landed us in most debated term “Big Data” in the present decade. Big Data has made the traditional relational databases (RDMS) handicapped in terms of their huge size and speed of its creation. The hunger to manage and process this gigantic complex heterogeneous data, has again followed the age old rule of “Necessity is the mother of Invention”, and came up with idea of HadoopMapReduce for the same. The given work uses K-Means clustering algorithm on a benchmark MRI dataset from OASIS database, in order to cluster the data based upon their visual similarity, using WEKA. Until a threshold size it worked out and after that compelled WEKA to prompt an emergency message “out of memory” on display. A Map/Reduce version of K-means is implemented on top of Hadoop using R, so as to cure this problem. The given algorithm is evaluated using Speedup, Scale up and Size up parameters and it neatly performed better as the size of the input data gets increased.
CITATION STYLE
Ayoub Shaikh, T., Badr Shafeeque, U., & Ahamad, M. (2018). An Intelligent Distributed K-means Algorithm over Cloudera /Hadoop. International Journal of Education and Management Engineering, 8(4), 61–70. https://doi.org/10.5815/ijeme.2018.04.06
Mendeley helps you to discover research relevant for your work.