Improving the memory access locality of hybrid MPI applications

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Maintaining memory access locality is continuing to be a challenge for parallel applications and their runtime environments. By exploiting locality, application performance, resource usage, and performance portability can be improved. The main challenge is to detect and fix memory locality issues for applications that use shared-memory programming models for intra-node parallelization. In this paper, we investigate improving memory access locality of a hybrid MPI+OpenMP application in two different ways, by manually fixing locality issues in its source code and by employing the Adaptive MPI (AMPI) runtime environment. Results show that AMPI can result in similar locality improvements as manual source code changes, leading to substantial performance and scalability gains compared to the unoptimized version and to a pure MPI runtime. Compared to the hybrid MPI+OpenMP baseline, our optimizations improved performance by 1.8x on a single cluster node, and by 1.4x on 32 nodes, with a speedup of 2.4x compared to a pure MPI execution on 32 nodes. In addition to performance, we also evaluate the impact of memory locality on the load balance within a node.

Cite

CITATION STYLE

APA

Diener, M., White, S., Kale, L. V., Campbell, M., Bodony, D. J., & Freund, J. B. (2017). Improving the memory access locality of hybrid MPI applications. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3127024.3127038

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free