A pattern for overlapping communication and computation with OpenMP* target directives

6Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

OpenMP* 4.0 introduced initial support for heterogeneous devices. OpenMP 4.5 improved programmability and added capabilities for asynchronous device kernel offload and data transfer management. However, the programmers are still burdened to optimize data transfer for improved performance and to deal with the limited amount of memory on the target device. This work presents a pipelining concept to efficiently overlap communication and computation using the OpenMP 4.5 target directives. Our evaluation of two key HPC kernels shows performance improvements of up to 24% and the ability to process data larger than device memory.

Cite

CITATION STYLE

APA

Hahnfeld, J., Cramer, T., Klemm, M., Terboven, C., & Müller, M. S. (2017). A pattern for overlapping communication and computation with OpenMP* target directives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10468 LNCS, pp. 325–337). Springer Verlag. https://doi.org/10.1007/978-3-319-65578-9_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free