Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Lukasz Szustak; Kamil Halbiniak; Roman Wyrzykowski; Ondřej Jakl

Journal ArticleOPEN ACCESS

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Journal of Supercomputing (2019) 75(12) 7765-7777

DOI: 10.1007/s11227-018-2460-0

4Citations

5Readers

Abstract

This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3 + 1)D decomposition. The method allows a flexible management of the trade-off between computation and communication costs in accordance with features of modern ccNUMA architectures. Its efficiency is demonstrated for the implementation of MPDATA on the SGI UV 2000 and UV 3000 servers, as well as for 2- and 4-socket ccNUMA platforms based on various Intel CPU architectures, including Skylake, Broadwell, and Haswell.

Author supplied keywords

Cite

CITATION STYLE

APA

Szustak, L., Halbiniak, K., Wyrzykowski, R., & Jakl, O. (2019). Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations. Journal of Supercomputing, 75(12), 7765–7777. https://doi.org/10.1007/s11227-018-2460-0

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Abstract

Author supplied keywords

Cite

Register to see more suggestions