An Intelligent Distributed K-means Algorithm over Cloudera /Hadoop

  • Ayoub Shaikh T
  • Badr Shafeeque U
  • et al.
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

The 21st century evolved with tsunami of data generation by the human civilization that has delivered new words like Big Data to the world of vocabulary. Digitization process has almost overtaken all the major sectors and it has played a pivotal role of dominance as for as virtual digital world is concerned. This in turn has landed us in most debated term “Big Data” in the present decade. Big Data has made the traditional relational databases (RDMS) handicapped in terms of their huge size and speed of its creation. The hunger to manage and process this gigantic complex heterogeneous data, has again followed the age old rule of “Necessity is the mother of Invention”, and came up with idea of HadoopMapReduce for the same. The given work uses K-Means clustering algorithm on a benchmark MRI dataset from OASIS database, in order to cluster the data based upon their visual similarity, using WEKA. Until a threshold size it worked out and after that compelled WEKA to prompt an emergency message “out of memory” on display. A Map/Reduce version of K-means is implemented on top of Hadoop using R, so as to cure this problem. The given algorithm is evaluated using Speedup, Scale up and Size up parameters and it neatly performed better as the size of the input data gets increased.

Cite

CITATION STYLE

APA

Ayoub Shaikh, T., Badr Shafeeque, U., & Ahamad, M. (2018). An Intelligent Distributed K-means Algorithm over Cloudera /Hadoop. International Journal of Education and Management Engineering, 8(4), 61–70. https://doi.org/10.5815/ijeme.2018.04.06

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free