MapReduce Parallel Programming Model: A State-of-the-Art Survey

49Citations
Citations of this article
104Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the development of information technologies, we have entered the era of Big Data. Google’s MapReduce programming model and its open-source implementation in Apache Hadoop have become the dominant model for data-intensive processing because of its simplicity, scalability, and fault tolerance. However, several inherent limitations, such as lack of efficient scheduling and iteration computing mechanisms, seriously affect the efficiency and flexibility of MapReduce. To date, various approaches have been proposed to extend MapReduce model and improve runtime efficiency for different scenarios. In this review, we assess MapReduce to help researchers better understand these novel optimizations that have been taken to address its limitations. We first present the basic idea underlying MapReduce paradigm and describe several widely used open-source runtime systems. And then we discuss the main shortcomings of original MapReduce. We also review these MapReduce optimization approaches that have recently been put forward, and categorize them according to the characteristics and capabilities. Finally, we conclude the paper and suggest several research works that should be carried out in the future.

Cite

CITATION STYLE

APA

Li, R., Hu, H., Li, H., Wu, Y., & Yang, J. (2016, August 1). MapReduce Parallel Programming Model: A State-of-the-Art Survey. International Journal of Parallel Programming. Springer New York LLC. https://doi.org/10.1007/s10766-015-0395-0

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free