A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application

13Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Performance portability is becoming more-and-more important as next-generation high performance computing systems grow increasingly diverse and heterogeneous. Several new approaches to parallel programming, such as SYCL and Kokkos, have been developed in recent years to tackle this challenge. While several studies have been published evaluating these new programming models, they have tended to focus on memory-bandwidth bound applications. In this paper we analyse the performance of what appear to be the most promising modern parallel programming models, on a diverse range of contemporary high-performance hardware, using a compute-bound molecular docking mini-app. We present miniBUDE, a mini-app for BUDE, the Bristol University Docking Engine, a real application routinely used for drug discovery. We benchmark miniBUDE on real-world inputs for the full-scale application in order to follow its performance profile closely in the mini-app. We implement the mini-app in different programming models targeting both CPUs and GPUs, including SYCL and Kokkos, two of the more promising and widely used modern parallel programming models. We then present an analysis of the performance of each implementation, which we compare to highly optimised baselines set using established programming models such as OpenMP, OpenCL, and CUDA. Our study includes a wide variety of modern hardware platforms covering CPUs based on × 86 and Arm architectures, as well as GPUs. We found that, with the emerging parallel programming models, we could achieve performance comparable to that of the established models, and that a higher-level framework such as SYCL can achieve OpenMP levels of performance while aiding productivity. We identify a set of key challenges and pitfalls to take into account when adopting these emerging programming models, some of which are implementation-specific effects and not fundamental design errors that would prevent further adoption. Finally, we discuss our findings in the wider context of performance-portable compute-bound workloads.

Cite

CITATION STYLE

APA

Poenaru, A., Lin, W. C., & McIntosh-Smith, S. (2021). A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12728 LNCS, pp. 332–350). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-78713-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free