GPUs are an attractive target for data parallel stencil computations prevalent in scientific computing and image processing applications. Many tiling schemes, such as overlapped tiling and split tiling, have been proposed in past to improve the performance of stencil computations. While effective for 2D stencils, these techniques do not achieve the desired improvements for 3D stencils due to the hardware constraints of GPU. A major challenge in optimizing stencil computations is to effectively utilize all resources available on the GPU. In this paper we develop a tiling strategy that makes better use of resources like shared memory and register file available on the hardware. We present a systematic methodology to reason about which strategy should be employed for a given stencil and also discuss implementation choices that have a significant effect on the achieved performance. Applying these techniques to various 2D and 3D stencils gives a performance improvement of 200-400% over existing tools that target such computations.
CITATION STYLE
Rawat, P. S., Hong, C., Ravishankar, M., Grover, V., Pouchet, L. N., & Sadayappan, P. (2016). Effective resource management for enhancing performance of 2D and 3D stencils on GPUs. In 9th Workshop on General Purpose Processing using GPUs, GPGPU 2016 - Proceedings (pp. 92–102). Association for Computing Machinery, Inc. https://doi.org/10.1145/2884045.2884047
Mendeley helps you to discover research relevant for your work.