Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Akihiro Hayashi; Jun Shirako; Etorre Tiotto; Robert Ho; Vivek Sarkar

Journal ArticleOPEN ACCESS

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Hayashi A
Shirako J
Tiotto E
et al.

International Journal of High Performance Computing and Networking (2019) 13(1) 54

DOI: 10.1504/ijhpcn.2019.097051

N/ACitations

15Readers

Abstract

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimisations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimising high-level programs for GPU execution, in this paper, we: 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100); 2) conduct a comparable performance analysis among handwritten CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers. Since joining the IBM XL compiler team, he has worked on numerous releases of the industry leading XL C/C++ and Fortran compilers for POWER. He currently leads the static XL compiler team focusing on the development of GPU code generation and optimisation strategies for OpenMP 4.5.

Cite

CITATION STYLE

APA

Hayashi, A., Shirako, J., Tiotto, E., Ho, R., & Sarkar, V. (2019). Performance evaluation of OpenMP’s target construct on GPUs - exploring compiler optimisations. International Journal of High Performance Computing and Networking, 13(1), 54. https://doi.org/10.1504/ijhpcn.2019.097051

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Abstract

Cite

Register to see more suggestions