Mapreduce performance optimization based on block aggregation

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

MapReduce is a distributed programming model for large-scale data processing. Hadoop as an open source implementation of the MapReduce programming model has been widely used due to its good scalability and fault tolerance. However, the default size of the split and Hadoop distributed file system (HDFS) block are the same, which makes the number of map tasks of the job increase linearly with the number of blocks. When input is large, the time for managing splits and initializing map tasks is considerable. In this paper, we propose a scheme, Block Aggregation MapReduce (BAMR), which automatically increases the split size appropriately according to input’s size in order to reduce the number of map tasks. With this scheme, the time of managing splits and initializing map tasks will be shorten. Experiment shows that BAMR reduces the execution time significantly.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Ai, L., & Ding, D. (2014). Mapreduce performance optimization based on block aggregation. Advances in Intelligent Systems and Computing, 255, 853–861. https://doi.org/10.1007/978-81-322-1759-6_97

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free