Designing high performance communication runtime for GPU managed memory: Early experiences

6Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Graphics Processing Units (GPUs) have gained the position of a main stream accelerator due to its low power footprint and massive parallelism. CUDA 6.0 onward, NVIDIA has introduced the Managed Memory capability which unifies the host and device memory allocations into a single allocation and removes the requirement for explicit memory transfers between either memories. Several applications particularly of irregular nature can have immense benefits from managed memory because of the high productivity in programming that can be achieved owing to the minimal effort involved in the data management and movement. The MVAPICH2 library utilizes runtime designs such as CUDA Inter Process Communications (IPC) and GPUDirect RDMA (GDR) under the CUDA-Aware concept, to offer high productivity and programmability with MPI on modern clusters. However, integration and interaction of managed memory with these features raises challenges for efficient small and large message communications. In this study, we present an initial evaluation of managed memory capability and its interaction with existing high performance designs and features available in MVAPICH2 library. We propose new designs to enable efficient communication support between managed memory buffers. We also perform fine tuning to optimize the transfers between managed memories residing in GPUs. To the best of our knowledge, this is the first evaluation and study of managed memory and its interaction with MPI runtimes. A detailed evaluation and analysis of the performance of the proposed designs is presented. The Stencil2D communication kernel available in the SHOC suite was re-designed to enable the managed memory support. The evaluation shows a 4x improvement in the timings of stencil exchanges on 16 GPU nodes.

Cite

CITATION STYLE

APA

Banerjee, D. S., Hamidouche, K., & Panda, D. K. (2016). Designing high performance communication runtime for GPU managed memory: Early experiences. In 9th Workshop on General Purpose Processing using GPUs, GPGPU 2016 - Proceedings (pp. 82–91). Association for Computing Machinery, Inc. https://doi.org/10.1145/2884045.2884050

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free