Self-balancing job parallelism and throughput in Hadoop

Bo Zhang; Filip Křikava; Romain Rouvoy; Lionel Seinturier

Conference ProceedingsOPEN ACCESS

Self-balancing job parallelism and throughput in Hadoop

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9687 129-143

DOI: 10.1007/978-3-319-39577-7_11

3Citations

9Readers

Abstract

In Hadoop cluster, the performance and the resource consumption of MapReduce jobs do not only depend on the characteristics of these applications and workloads, but also on the appropriate setting of Hadoop configuration parameters. However, when the job workloads are not known a priori or they evolve over time, a static configuration may quickly lead to a waste of computing resources and consequently to a performance degradation. In this paper, we therefore propose an on-line approach that dynamically reconfigures Hadoop at runtime. Concretely, we focus on balancing the job parallelism and throughput by adjusting Hadoop capacity scheduler memory configuration. Our evaluation shows that the approach outperforms vanilla Hadoop deployments by up to 40% and the best statically profiled configurations by up to 13 %.

Cite

CITATION STYLE

APA

Zhang, B., Křikava, F., Rouvoy, R., & Seinturier, L. (2016). Self-balancing job parallelism and throughput in Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9687, pp. 129–143). Springer Verlag. https://doi.org/10.1007/978-3-319-39577-7_11

Self-balancing job parallelism and throughput in Hadoop

Abstract

Cite

Register to see more suggestions