Hadoop MapReduce is increasingly being used bymany data- centers (e.g. Facebook, Yahoo!) because of its simplicity, productivity, scalability, and fault tolerance. For MapReduce applications, achieving low job execution time is critical. Since a majority of the existing clus- ters today are equipped with modern, high-speed interconnects such as InfiniBand and 10 GigE, that offer high bandwidth and low communi- cation latency, it is essential to study the impact of network configura- tion on the communication patterns of the MapReduce job. However, a standardized benchmark suite that focuses on helping users evaluate the performance of the stand-alone Hadoop MapReduce component is not available in the current Apache Hadoop community. In this paper, we propose a micro-benchmark suite that can be used to evaluate the per- formance of stand-alone Hadoop MapReduce, with different intermedi- ate data distribution patterns, varied key/value sizes, and data types.We also show how this micro-benchmark suite can be used to evaluate the performance of Hadoop MapReduce over different networks/protocols and parameter configurations on modern clusters. The micro-benchmark suite is designed to be compatible with both Hadoop 1.x and Hadoop 2.x.
CITATION STYLE
B, D. S., Lu, X., & Islam, N. (2014). Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE-5: The 5th Workshop on Big Data Benchmark, Performance Optimization and Emerging Hardware, 8807, 19–33. Retrieved from http://link.springer.com/10.1007/978-3-319-13021-7
Mendeley helps you to discover research relevant for your work.