Multi-GPU implementations of parallel 3D sweeping algorithms with application to geological folding

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken to mask the overhead of various data movements between the GPUs. Multiple OpenMP threads on the CPU side should be combined multiple CUDA streams per GPU to hide the data transfer cost related to the halo computation on each 2D plane. Moreover, the technique of peer-to-peer data motion can be used to reduce the impact of 3D volumetric data shuffles that have to be done between mandatory changes of the grid partitioning. We have investigated the performance improvement of 2- and 4-GPU implementations that are applicable to 3D anisotropic front propagation computations related to geological folding. In comparison with a straightforward multi-GPU implementation, the overall performance improvement due to masking of data movements on four GPUs of the Fermi architecture was 23%. The corresponding improvement obtained on four Kepler GPUs was 47%.

Cite

CITATION STYLE

APA

Krishnasamy, E., Sourouri, M., & Cai, X. (2015). Multi-GPU implementations of parallel 3D sweeping algorithms with application to geological folding. In Procedia Computer Science (Vol. 51, pp. 1494–1503). Elsevier B.V. https://doi.org/10.1016/j.procs.2015.05.339

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free