Detailed Modeling of Heterogeneous and Contention-Constrained Point-to-Point MPI Communication

8Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The network topology of modern parallel computing systems is inherently heterogeneous, with a variety of latency and bandwidth values. Moreover, contention for the bandwidth can exist on different levels when many processes communicate with each other. Many-pair, point-to-point MPI communication is thus characterized by heterogeneity and contention, even on a cluster of homogeneous multicore CPU nodes. To get a detailed understanding of the individual communication cost per MPI process, we propose a new modeling methodology that incorporates both heterogeneity and contention. First, we improve the standard max-rate model to better quantify the actually achievable bandwidth depending on the number of MPI processes in competition. Then, we make a further extension that more detailedly models the bandwidth contention when the competing MPI processes have different numbers of neighbors, with also non-uniform message sizes. Thereafter, we include more flexibility by considering interactions between intra-socket and inter-socket messaging. Through a series of experiments done on different processor architectures, we show that the new heterogeneous and contention-constrained performance models can adequately explain the individual communication cost associated with each MPI process. The largest test of realistic point-to-point MPI communication involves 8,192 processes and in total 2,744,632 simultaneous messages over 64 dual-socket AMD Epyc Rome compute nodes connected by InfiniBand, for which the overall prediction accuracy achieved is 84%.

Cite

CITATION STYLE

APA

Thune, A., Reinemo, S. A., Skeie, T., & Cai, X. (2023). Detailed Modeling of Heterogeneous and Contention-Constrained Point-to-Point MPI Communication. IEEE Transactions on Parallel and Distributed Systems, 34(5), 1580–1593. https://doi.org/10.1109/TPDS.2023.3253881

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free