Effective resource management for enhancing performance of 2D and 3D stencils on GPUs

19Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

GPUs are an attractive target for data parallel stencil computations prevalent in scientific computing and image processing applications. Many tiling schemes, such as overlapped tiling and split tiling, have been proposed in past to improve the performance of stencil computations. While effective for 2D stencils, these techniques do not achieve the desired improvements for 3D stencils due to the hardware constraints of GPU. A major challenge in optimizing stencil computations is to effectively utilize all resources available on the GPU. In this paper we develop a tiling strategy that makes better use of resources like shared memory and register file available on the hardware. We present a systematic methodology to reason about which strategy should be employed for a given stencil and also discuss implementation choices that have a significant effect on the achieved performance. Applying these techniques to various 2D and 3D stencils gives a performance improvement of 200-400% over existing tools that target such computations.

Cite

CITATION STYLE

APA

Rawat, P. S., Hong, C., Ravishankar, M., Grover, V., Pouchet, L. N., & Sadayappan, P. (2016). Effective resource management for enhancing performance of 2D and 3D stencils on GPUs. In 9th Workshop on General Purpose Processing using GPUs, GPGPU 2016 - Proceedings (pp. 92–102). Association for Computing Machinery, Inc. https://doi.org/10.1145/2884045.2884047

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free