Radoop : Analyzing Big Data with RapidMiner and Hadoop

  • Ak Z
  • Makrai G
  • Henk T
 et al. 
  • 117

    Readers

    Mendeley users who have this article in their library.
  • N/A

    Citations

    Citations of this article.

Abstract

Working with large data sets is increasingly common in research and industry. There are some distributed data analytics solutions like Hadoop, that offer high scalability and fault-tolerance, but they usually lack a user interface and only developers can exploit their functionali- ties. In this paper, we present Radoop, an extension for the RapidMiner data mining tool which provides easy-to-use operators for running dis- tributed processes on Hadoop. We describe integration and development details and provide runtime measurements for several data transforma- tion tasks. We conclude that Radoop is an excellent tool for big data analytics and scales well with increasing data set size and the number of nodes in the cluster.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Zoltan Prekopcs Ak

  • Gabor Makrai

  • Tamas Henk

  • Csaba Gar-Papanek

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free