Big Data Benchmarks, Performance Optimization, and Emerging Hardware

  • B D
  • Lu X
  • Islam N
ISSN: 16113349
N/ACitations
Citations of this article
68Readers
Mendeley users who have this article in their library.

Abstract

Hadoop MapReduce is increasingly being used bymany data- centers (e.g. Facebook, Yahoo!) because of its simplicity, productivity, scalability, and fault tolerance. For MapReduce applications, achieving low job execution time is critical. Since a majority of the existing clus- ters today are equipped with modern, high-speed interconnects such as InfiniBand and 10 GigE, that offer high bandwidth and low communi- cation latency, it is essential to study the impact of network configura- tion on the communication patterns of the MapReduce job. However, a standardized benchmark suite that focuses on helping users evaluate the performance of the stand-alone Hadoop MapReduce component is not available in the current Apache Hadoop community. In this paper, we propose a micro-benchmark suite that can be used to evaluate the per- formance of stand-alone Hadoop MapReduce, with different intermedi- ate data distribution patterns, varied key/value sizes, and data types.We also show how this micro-benchmark suite can be used to evaluate the performance of Hadoop MapReduce over different networks/protocols and parameter configurations on modern clusters. The micro-benchmark suite is designed to be compatible with both Hadoop 1.x and Hadoop 2.x.

Cite

CITATION STYLE

APA

B, D. S., Lu, X., & Islam, N. (2014). Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE-5: The 5th Workshop on Big Data Benchmark, Performance Optimization and Emerging Hardware, 8807, 19–33. Retrieved from http://link.springer.com/10.1007/978-3-319-13021-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free