Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU

Mikhail Khalilov; Alexey Timoveev

Conference ProceedingsOPEN ACCESS

Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU

Journal of Physics: Conference Series (2021) 1740(1)

DOI: 10.1088/1742-6596/1740/1/012056

33Citations

26Readers

Abstract

Graphics processors are widely utilized in modern supercomputers as accelerators. Ability to perform efficient parallelization and low-level allow scientists to greatly boost performance of their codes. Modern Nvidia GPUs feature low-level approaches, such as CUDA, along with high-level approaches: OpenACC and OpenMP. While the low-level approach aims to explore all possible abilities of SIMT GPU architecture by writing low-level C/C++ code, it takes significant effort from programmer. OpenACC and OpenMP programming models are opposite to CUDA. Using these models the programmer only have to identify the blocks of code to be parallelized using pragmas. We compare the performance of CUDA, OpenMP and OpenACC on state-of-the-art Nvidia Tesla V100 GPU in various typical scenarios that arise in scientific programming, such as matrix multiplication, regular memory access patterns and evaluate performance of physical simulation codes implemented using these programming models. Moreover, we study the performance matrix multiplication implemented in vendor-optimized BLAS libraries for Nvidia Tesla V100 GPU and modern Intel Xeon processor.

Cite

CITATION STYLE

APA

Khalilov, M., & Timoveev, A. (2021). Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU. In Journal of Physics: Conference Series (Vol. 1740). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1740/1/012056

Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU

Abstract

Cite

Register to see more suggestions