Scalable work-stealing load-balancer for HPC distributed memory systems

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Work-stealing schedulers are common in shared memory environments. However, large scale distributed memory usage has been limited to specific ad-hoc implementations preventing a broader adoption. In this paper we introduce a new scalable work-stealing algorithm for distributed memory systems as well as our implementation as the TITUS_DLB library. It is based on Kleinberg’s small-world graph. It allows to control the communication patterns and associated runtime overheads while providing efficient heuristics for victim selection and results routing. To validate our approach, we present the DLB_Bench benchmark which emulates arbitrary workload distribution and imbalance characteristics. Finally, we compare TITUS_DLB to the ad-hoc solution developed for the YALES2 computational fluid dynamics and combustion solver. We achieve up to 54% performance gain over thousands of cores.

Cite

CITATION STYLE

APA

Fontenaille, C., Petit, E., de Oliveira Castro, P., Uemura, S., Sohier, D., Lesnicki, P., … Moureau, V. (2019). Scalable work-stealing load-balancer for HPC distributed memory systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11339 LNCS, pp. 146–158). Springer Verlag. https://doi.org/10.1007/978-3-030-10549-5_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free