A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance networks

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Hadoop MapReduce is increasingly being used by many datacenters (e.g. Facebook, Yahoo!) because of its simplicity, productivity, scalability, and fault tolerance. For MapReduce applications, achieving low job execution time is critical. Since a majority of the existing clusters today are equipped with modern, high-speed interconnects such as InfiniBand and 10 GigE, that offer high bandwidth and low communication latency, it is essential to study the impact of network configuration on the communication patterns of the MapReduce job. However, a standardized benchmark suite that focuses on helping users evaluate the performance of the stand-alone Hadoop MapReduce component is not available in the current Apache Hadoop community. In this paper, we propose a micro-benchmark suite that can be used to evaluate the performance of stand-alone Hadoop MapReduce, with different intermediate data distribution patterns, varied key/value sizes, and data types.We also show how this micro-benchmark suite can be used to evaluate the performance of Hadoop MapReduce over different networks/protocols and parameter configurations on modern clusters. The micro-benchmark suite is designed to be compatible with both Hadoop 1.x and Hadoop 2.x.

Cite

CITATION STYLE

APA

Shankar, D., Lu, X., Wasi-ur-Rahman, M., Islam, N., & Panda, D. K. (2014). A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance networks. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8807, 19–33. https://doi.org/10.1007/978-3-319-13021-7_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free