Performant Portable OpenMP

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA's HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.

Cite

CITATION STYLE

APA

Ozen, G., & Wolfe, M. (2022). Performant Portable OpenMP. In CC 2022 - Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction (pp. 156–168). Association for Computing Machinery, Inc. https://doi.org/10.1145/3497776.3517780

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free