Scalable work-stealing load-balancer for HPC distributed memory systems

Clement Fontenaille; Eric Petit; Pablo de Oliveira Castro; Seijilo Uemura; Devan Sohier; Piotr Lesnicki; Ghislain Lartigue; Vincent Moureau

Conference ProceedingsOPEN ACCESS

Scalable work-stealing load-balancer for HPC distributed memory systems

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11339 LNCS 146-158

DOI: 10.1007/978-3-030-10549-5_12

0Citations

6Readers

Abstract

Work-stealing schedulers are common in shared memory environments. However, large scale distributed memory usage has been limited to specific ad-hoc implementations preventing a broader adoption. In this paper we introduce a new scalable work-stealing algorithm for distributed memory systems as well as our implementation as the TITUS_DLB library. It is based on Kleinberg’s small-world graph. It allows to control the communication patterns and associated runtime overheads while providing efficient heuristics for victim selection and results routing. To validate our approach, we present the DLB_Bench benchmark which emulates arbitrary workload distribution and imbalance characteristics. Finally, we compare TITUS_DLB to the ad-hoc solution developed for the YALES2 computational fluid dynamics and combustion solver. We achieve up to 54% performance gain over thousands of cores.

Cite

CITATION STYLE

APA

Fontenaille, C., Petit, E., de Oliveira Castro, P., Uemura, S., Sohier, D., Lesnicki, P., … Moureau, V. (2019). Scalable work-stealing load-balancer for HPC distributed memory systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11339 LNCS, pp. 146–158). Springer Verlag. https://doi.org/10.1007/978-3-030-10549-5_12

Scalable work-stealing load-balancer for HPC distributed memory systems

Abstract

Cite

Register to see more suggestions