Balance resource allocation for spark jobs based on prediction of the optimal resource

Zhiyao Hu; Dongsheng Li; Deke Guo

Journal ArticleOPEN ACCESS

Balance resource allocation for spark jobs based on prediction of the optimal resource

Tsinghua Science and Technology (2020) 25(4) 487-497

DOI: 10.26599/TST.2019.9010054

21Citations

13Readers

Abstract

Apache Spark provides a well-known MapReduce computing framework, aiming to fast-process big data analytics in data-parallel manners. With this platform, large input data are divided into data partitions. Each data partition is processed by multiple computation tasks concurrently. Outputs of these computation tasks are transferred among multiple computers via the network. However, such a distributed computing framework suffers from system overheads, inevitably caused by communication and disk I/O operations. System overheads take up a large proportion of the Job Completion Time (JCT). We observed that excessive computational resources incurs considerable system overheads, prolonging the JCT. The over-allocation of individual jobs not only prolongs their own JCTs, but also likely makes other jobs suffer from under-allocation. Thus, the average JCT is suboptimal, too. To address this problem, we propose a prediction model to estimate the changing JCT of a single Spark job. With the support of the prediction method, we designed a heuristic algorithm to balance the resource allocation of multiple Spark jobs, aiming to minimize the average JCT in multiple-job cases. We implemented the prediction model and resource allocation method in ReB, a Resource-Balancer based on Apache Spark. Experimental results showed that ReB significantly outperformed the traditional max-min fairness and shortest-job-optimal methods. The average JCT was decreased by around 10%-30% compared to the existing solutions.

Author supplied keywords

Cite

CITATION STYLE

APA

Hu, Z., Li, D., & Guo, D. (2020). Balance resource allocation for spark jobs based on prediction of the optimal resource. Tsinghua Science and Technology, 25(4), 487–497. https://doi.org/10.26599/TST.2019.9010054

Balance resource allocation for spark jobs based on prediction of the optimal resource

Abstract

Author supplied keywords

Cite

Register to see more suggestions