Improving communication by optimizing on-node data movement with data layout

9Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present optimizations to improve communication performance by reducing on-node data movement for a class of distributed memory applications. The primary concept is to eliminate the data movement associated with packing and unpacking subsets of the data during communication. With the rapid rise in network injection bandwidth reducing off-node data movement cost, on-node data movement can be significantly more expensive than computation and network communication. This data movement is especially costly for small domains - as in memory-intensive multi-physics codes or when strong scaling to reduce time-to-solution. The optimizations presented include (1) optimizing data layout through indirection to enable pack-free communication; (2) creating contiguous views of memory using memory mapping thus minimizing the number of messages; and (3) applying these techniques to intra-node data movement including CPU-GPU data movement. The benefits of these optimizations are demonstrated in stencil benchmarks against a highly-optimized baseline, reducing communication time by up to 14.4×.

Cite

CITATION STYLE

APA

Zhao, T., Hall, M., Johansen, H., & Williams, S. (2021). Improving communication by optimizing on-node data movement with data layout. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (pp. 304–317). Association for Computing Machinery. https://doi.org/10.1145/3437801.3441598

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free