Optimizing GPU programs by tuning execution parameters is essential to realizing the full performance potential of GPU hardware. However, many of these optimizations do not ensure correctness and subtle errors can enter while optimizing a GPU program. Further, lack of formal models and the presence of non-trivial transformations prevent verification of optimizations. In this work, we verify transformations involved in tuning the execution parameter, block-size. First, we present a formal programming and execution model for GPUs, and then formalize block-size independence of GPU programs, which ensures tuning block-size preserves program semantics. Next, we present an inter-procedural analysis to verify block-size independence for synchronization-free GPU programs. Finally, we evaluate the analysis on the Nvidia CUDA SDK samples, where 35 global kernels are verified to be block-size independent.
CITATION STYLE
Alur, R., Devietti, J., & Singhania, N. (2018). Block-Size Independence for GPU Programs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11002 LNCS, pp. 107–126). Springer Verlag. https://doi.org/10.1007/978-3-319-99725-4_9
Mendeley helps you to discover research relevant for your work.