Hadoop MapReduce is increasingly being used bymany data- centers (e.g. Facebook, Yahoo!) because of its simplicity, productivity, scalability, and fault tolerance. For MapReduce applications, achieving low job execution time is critical. Since a majority of the existing clus- ters today are equipped with modern, high-speed interconnects such as InfiniBand and 10 GigE, that offer high bandwidth and low communi- cation latency, it is essential to study the impact of network configura- tion on the communication patterns of the MapReduce job. However, a standardized benchmark suite that focuses on helping users evaluate the performance of the stand-alone Hadoop MapReduce component is not available in the current Apache Hadoop community. In this paper, we propose a micro-benchmark suite that can be used to evaluate the per- formance of stand-alone Hadoop MapReduce, with different intermedi- ate data distribution patterns, varied key/value sizes, and data types.We also show how this micro-benchmark suite can be used to evaluate the performance of Hadoop MapReduce over different networks/protocols and parameter configurations on modern clusters. The micro-benchmark suite is designed to be compatible with both Hadoop 1.x and Hadoop 2.x.
CITATION STYLE
Pal, S. (2016). Why SQL on Big Data? In SQL on Big Data (pp. 1–15). Apress. https://doi.org/10.1007/978-1-4842-2247-8_1
Mendeley helps you to discover research relevant for your work.