Evaluation of asynchronous offloading capabilities of accelerator programming models for multiple devices

Jonas Hahnfeld; Christian Terboven; James Price; Hans Joachim Pflug; Matthias S. Müller

Conference Proceedings

Evaluation of asynchronous offloading capabilities of accelerator programming models for multiple devices

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10732 LNCS 160-182

DOI: 10.1007/978-3-319-74896-2_9

7Citations

3Readers

Get full text

Abstract

Accelerator devices are increasingly used to build large supercomputers and current installations usually include more than one accelerator per system node. To keep all devices busy, kernels have to be executed concurrently which can be achieved via asynchronous kernel launches. This work compares the performance for an implementation of the Conjugate Gradient method with CUDA, OpenCL, and OpenACC on NVIDIA Pascal GPUs. Furthermore, it takes a look at Intel Xeon Phi coprocessors when programmed with OpenCL and OpenMP. In doing so, it tries to answer the question of whether the higher abstraction level of directive based models is inferior to lower level paradigms in terms of performance.

Cite

CITATION STYLE

APA

Hahnfeld, J., Terboven, C., Price, J., Pflug, H. J., & Müller, M. S. (2018). Evaluation of asynchronous offloading capabilities of accelerator programming models for multiple devices. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10732 LNCS, pp. 160–182). Springer Verlag. https://doi.org/10.1007/978-3-319-74896-2_9

Evaluation of asynchronous offloading capabilities of accelerator programming models for multiple devices

Abstract

Cite

Register to see more suggestions