Mapcombine: A lightweight solution to improve the efficiency of iterative mapreduce

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

MapReduce is a brilliant distributed computing strategy to process massive-scale data. However, for iterative applications, the general MapReduce needs to re-initialize runtime environment repetitively and re-load static data repetitively in every iteration. Thus, a great deal of CPU time and I/O bandwidth are wasted. This paper presents a lightweight solution to improve the efficiency of iterative MapReduce, which named MapCombine. The main contributions of MapCombine are as follows: (1) To avoid re-initialization of the runtime environment, a controller component is plugged into the general MapReduce model to schedule the iterations; (2) To process data without reloading the static subset, we modify the general MapReduce model surrounding combine phase to cache the fixed data and 4e the workload before processing; (3) To make the communication between the controller and the combiners flexible with the consideration of fault tolerance and downtime recovery, we append an interaction layer to the MapReduce implementation architecture. We also show performance comparisons between MapCombine and Mahout for four clustering algorithms, and then conclude that the average speedup ratio provided by MapCombine is 1.14. © Springer-Verlag Berlin Heidelberg 2012.

Cite

CITATION STYLE

APA

Xu, W., Gong, X., & Li, X. (2013). Mapcombine: A lightweight solution to improve the efficiency of iterative mapreduce. Communications in Computer and Information Science, 332, 444–456. https://doi.org/10.1007/978-3-642-34447-3_40

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free