Effective multi-GPU communication using multiple CUDA streams and threads

Mohammed Sourouri; Tor Gillberg; Scott B. Baden; Xing Cai

Conference ProceedingsOPEN ACCESS

Effective multi-GPU communication using multiple CUDA streams and threads

Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS (2014) 2015-April 981-986

DOI: 10.1109/PADSW.2014.7097919

22Citations

37Readers

Get full text

Abstract

In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.

Author supplied keywords

Cite

CITATION STYLE

APA

Sourouri, M., Gillberg, T., Baden, S. B., & Cai, X. (2014). Effective multi-GPU communication using multiple CUDA streams and threads. In Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS (Vol. 2015-April, pp. 981–986). IEEE Computer Society. https://doi.org/10.1109/PADSW.2014.7097919

Effective multi-GPU communication using multiple CUDA streams and threads

Abstract

Author supplied keywords

Cite

Register to see more suggestions