Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modify the same array elements over different time steps, which presents opportunities for the compiler to improve the temporal data locality through loop tiling. This article presents a compiler framework for automatic tiling of iterative stencil loops, with the objective of improving the cache performance. The article first presents a technique which allows loop tiling to satisfy data dependences in spite of the difficulty created by imperfectly nested inner loops. It does so by skewing the inner loops over the time steps and by applying a uniform skew factor to all loops at the same nesting level. Based on a memory cost analysis, the article shows that the skew factor must be minimized at every loop level in order to minimize cache misses. A graph-theoretical algorithm, which takes polynomial time, is presented to determine the minimum skew factor. Furthermore, the memory-cost analysis derives the tile size which minimizes capacity misses. Given the tile size, an efficient and general array-padding scheme is applied to remove conflict misses. Experiments were conducted on 16 test programs and preliminary results showed an average speedup of 1.58 and a maximum speedup of 5.06 across those test programs.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below