In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node.
CITATION STYLE
Sourouri, M., Gillberg, T., Baden, S. B., & Cai, X. (2014). Effective multi-GPU communication using multiple CUDA streams and threads. In Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS (Vol. 2015-April, pp. 981–986). IEEE Computer Society. https://doi.org/10.1109/PADSW.2014.7097919
Mendeley helps you to discover research relevant for your work.