Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3 + 1)D decomposition. The method allows a flexible management of the trade-off between computation and communication costs in accordance with features of modern ccNUMA architectures. Its efficiency is demonstrated for the implementation of MPDATA on the SGI UV 2000 and UV 3000 servers, as well as for 2- and 4-socket ccNUMA platforms based on various Intel CPU architectures, including Skylake, Broadwell, and Haswell.

Cite

CITATION STYLE

APA

Szustak, L., Halbiniak, K., Wyrzykowski, R., & Jakl, O. (2019). Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations. Journal of Supercomputing, 75(12), 7765–7777. https://doi.org/10.1007/s11227-018-2460-0

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free