This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA). We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3 + 1)D decomposition. The method allows a flexible management of the trade-off between computation and communication costs in accordance with features of modern ccNUMA architectures. Its efficiency is demonstrated for the implementation of MPDATA on the SGI UV 2000 and UV 3000 servers, as well as for 2- and 4-socket ccNUMA platforms based on various Intel CPU architectures, including Skylake, Broadwell, and Haswell.
CITATION STYLE
Szustak, L., Halbiniak, K., Wyrzykowski, R., & Jakl, O. (2019). Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations. Journal of Supercomputing, 75(12), 7765–7777. https://doi.org/10.1007/s11227-018-2460-0
Mendeley helps you to discover research relevant for your work.