Mapreduce performance optimization based on block aggregation

Jun Li; Lihua Ai; Ding Ding

Journal Article

Mapreduce performance optimization based on block aggregation

Advances in Intelligent Systems and Computing (2014) 255 853-861

DOI: 10.1007/978-81-322-1759-6_97

0Citations

2Readers

Get full text

Abstract

MapReduce is a distributed programming model for large-scale data processing. Hadoop as an open source implementation of the MapReduce programming model has been widely used due to its good scalability and fault tolerance. However, the default size of the split and Hadoop distributed file system (HDFS) block are the same, which makes the number of map tasks of the job increase linearly with the number of blocks. When input is large, the time for managing splits and initializing map tasks is considerable. In this paper, we propose a scheme, Block Aggregation MapReduce (BAMR), which automatically increases the split size appropriately according to input’s size in order to reduce the number of map tasks. With this scheme, the time of managing splits and initializing map tasks will be shorten. Experiment shows that BAMR reduces the execution time significantly.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Ai, L., & Ding, D. (2014). Mapreduce performance optimization based on block aggregation. Advances in Intelligent Systems and Computing, 255, 853–861. https://doi.org/10.1007/978-81-322-1759-6_97

Mapreduce performance optimization based on block aggregation

Abstract

Author supplied keywords

Cite

Register to see more suggestions