An efficient solution for processing skewed MapReduce jobs

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although MapReduce has been praised for its high scalability and fault tolerance, it has been criticized in some points, in particular, its poor performance in the case of data skew. There are important cases where a high percentage of processing in the reduce side is done by a few nodes, or even one node, while the others remain idle. There have been some attempts to address the problem of data skew, but only for specific cases. In particular, there is no proposed solution for the cases where most of the intermediate values correspond to a single key, or when the number of keys is less than the number of reduce workers. In this paper, we propose FP-Hadoop, a system that makes the reduce side of MapReduce more parallel, and efficiently deals with the problem of data skew in the reduce side. In FP-Hadoop, there is a new phase, called intermediate reduce (IR), in which blocks of intermediate values, constructed dynamically, are processed by intermediate reduce workers in parallel, by using a scheduling strategy. By using the IR phase, even if all intermediate values belong to only one key, the main part of the reducing work can be done in parallel by using the computing resources of all available workers. We implemented a prototype of FP-Hadoop, and conducted extensive experiments over synthetic and real datasets. We achieved excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.

Author supplied keywords

Cite

CITATION STYLE

APA

Akbarinia, R., Liroz-Gistau, M., Agrawal, D., & Valduriez, P. (2015). An efficient solution for processing skewed MapReduce jobs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9262, pp. 417–429). Springer Verlag. https://doi.org/10.1007/978-3-319-22852-5_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free