Why SQL on Big Data?

  • Pal S
N/ACitations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Hadoop MapReduce is increasingly being used bymany data- centers (e.g. Facebook, Yahoo!) because of its simplicity, productivity, scalability, and fault tolerance. For MapReduce applications, achieving low job execution time is critical. Since a majority of the existing clus- ters today are equipped with modern, high-speed interconnects such as InfiniBand and 10 GigE, that offer high bandwidth and low communi- cation latency, it is essential to study the impact of network configura- tion on the communication patterns of the MapReduce job. However, a standardized benchmark suite that focuses on helping users evaluate the performance of the stand-alone Hadoop MapReduce component is not available in the current Apache Hadoop community. In this paper, we propose a micro-benchmark suite that can be used to evaluate the per- formance of stand-alone Hadoop MapReduce, with different intermedi- ate data distribution patterns, varied key/value sizes, and data types.We also show how this micro-benchmark suite can be used to evaluate the performance of Hadoop MapReduce over different networks/protocols and parameter configurations on modern clusters. The micro-benchmark suite is designed to be compatible with both Hadoop 1.x and Hadoop 2.x.

Cite

CITATION STYLE

APA

Pal, S. (2016). Why SQL on Big Data? In SQL on Big Data (pp. 1–15). Apress. https://doi.org/10.1007/978-1-4842-2247-8_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free