Effective multi-GPU communication using multiple CUDA streams and threads

22Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.

Cite

CITATION STYLE

APA

Sourouri, M., Gillberg, T., Baden, S. B., & Cai, X. (2014). Effective multi-GPU communication using multiple CUDA streams and threads. In Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS (Vol. 2015-April, pp. 981–986). IEEE Computer Society. https://doi.org/10.1109/PADSW.2014.7097919

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free