Performant Portable OpenMP

Guray Ozen; Michael Wolfe

Conference ProceedingsOPEN ACCESS

Performant Portable OpenMP

CC 2022 - Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction (2022) 156-168

DOI: 10.1145/3497776.3517780

6Citations

6Readers

Get full text

Abstract

Accelerated computing has increased the need to specialize how a program is parallelized depending on the target. Fully exploiting a highly parallel accelerator, such as a GPU, demands more parallelism and sometimes more levels of parallelism than a multicore CPU. OpenMP has a directive for each level of parallelism, but choosing directives for each target can incur a significant productivity cost. We argue that using the new OpenMP loop directive with an appropriate compiler decision process can achieve the same performance benefits of target-specific parallelization with the productivity advantage of a single directive for all targets. In this paper, we introduce a fully descriptive model and demonstrate its benefits with an implementation of the loop directive, comparing performance, productivity, and portability against other production compilers using the SPEC ACCEL benchmark suite. We provide an implementation of our proposal in NVIDIA's HPC compiler. It yields up to 56X speedup and an average of 1.91x-1.79x speedup compared to the baseline performance (depending on the host system) on GPUs, and preserves CPU performance. In addition, our proposal requires 60% fewer parallelism directives.

Author supplied keywords

Cite

CITATION STYLE

APA

Ozen, G., & Wolfe, M. (2022). Performant Portable OpenMP. In CC 2022 - Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction (pp. 156–168). Association for Computing Machinery, Inc. https://doi.org/10.1145/3497776.3517780

Performant Portable OpenMP

Abstract

Author supplied keywords

Cite

Register to see more suggestions