A case for non-blocking collectives in OpenSHMEM: Design, implementation, and performance evaluation using MVAPICH2-X

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

An ever increased push for performance in the HPC arena has led to a multitude of hybrid architectures in both software and hardware for HPC systems. Partitioned Global Address Space (PGAS) programming model has gained a lot of attention over the last couple of years. The main advantage of PGAS model is the ease of programming provided by the abstraction of a single memory across nodes of a cluster. OpenSHMEM implementations currently implement the OpenSHMEM 1.2 specification that provides interface for one-sided, atomic, and collective operations. However, the recent trend in HPC arena in general, and Message Passing Interface (MPI) community in specific, is to use Non-Blocking Collective (NBC) communication to efficiently overlap computation with communication to save precious CPU cycles. This work is inspired by encouraging performance numbers for NBC implementations of various MPI libraries. As the OpenSHMEM community has been discussing the use of non-blocking communication, in this paper, we propose an NBC interface for OpenSHMEM, present its design, implementation, and performance evaluation. We discuss the NBC interface that has been modeled along the lines of MPI NBC interface and requires minimal changes to the function signatures. We have designed and implemented this interface using the Unified Communication Runtime in MVAPICH2-X. In addition, we propose OpenSHMEM NBC benchmarks as an extension to the OpenSHMEM benchmarks available in the widely used OMB suite. Our performance evaluation shows that the proposed NBC implementation provides up to 96 percent overlap for different collectives with little NBC overhead.

Cite

CITATION STYLE

APA

Awan, A. A., Hamidouche, K., Chu, C. H., & Panda, D. (2015). A case for non-blocking collectives in OpenSHMEM: Design, implementation, and performance evaluation using MVAPICH2-X. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9397, pp. 69–86). Springer Verlag. https://doi.org/10.1007/978-3-319-26428-8_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free