Working with large data sets is increasingly common in research and industry. There are some distributed data analytics solutions like Hadoop, that offer high scalability and fault-tolerance, but they usually lack a user interface and only developers can exploit their functionali- ties. In this paper, we present Radoop, an extension for the RapidMiner data mining tool which provides easy-to-use operators for running dis- tributed processes on Hadoop. We describe integration and development details and provide runtime measurements for several data transforma- tion tasks. We conclude that Radoop is an excellent tool for big data analytics and scales well with increasing data set size and the number of nodes in the cluster.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below