Overlap network communications and computations is a major requirement to ensure scalability of HPC applications on future exascale machines. To this purpose the de-facto MPI standard provides non-blocking routines for asynchronous communication progress. In various implementations, a dedicated progress thread (PT) is deployed on the host CPU to actually achieve this overlap. However, current PT solutions struggle to find a balance between efficient detection of network events and minimal impact on the application computations. In this paper we propose a solution inspired from the PT approach which benefits from idle time of compute threads to make MPI communication progress in background. We implement our idea in the context of MPI+OpenMP collaboration using the OpenMP Tools interface which will be part of the OpenMP 5.0 standard. Our solution shows an overall performance gain on unbalanced workloads such as the AMG CORAL benchmark.
CITATION STYLE
Sergent, M., Dagrada, M., Carribault, P., Jaeger, J., Pérache, M., & Papauré, G. (2018). Efficient Communication/Computation Overlap with MPI+OpenMP Runtimes Collaboration. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11014 LNCS, pp. 560–572). Springer Verlag. https://doi.org/10.1007/978-3-319-96983-1_40
Mendeley helps you to discover research relevant for your work.