Optimized unrolling of nested loops

Vivek Sarkar

Conference Proceedings

Optimized unrolling of nested loops

Sarkar V

Proceedings of the International Conference on Supercomputing (2000) 153-166

DOI: 10.1145/335231.335246

34Citations

15Readers

Get full text

Abstract

In this paper, we address the problems of automatically selecting unroll factors for perfectly nested loops, and generating compact code for the selected unroll factors. Compared to past work, the contributions of our work include a) a more detailed cost model that includes ILP and I-cache considerations, b) a new code generation algorithm for unrolling nested loops that generates more compact code (with fewer remainder loops) than the unroll-and-jam transformation, and c) a new algorithm for efficiently enumerating feasible unroll vectors. Our experimental results confirm the wide applicability of our approach by showing a 2.2× speedup on matrix multiply, and an average 1.08× speedup on seven of the SPEC95fp benchmarks (with a 1.2× speedup for two benchmarks). These speedups are significant because the baseline compiler used for comparison is the IBM XL Fortran product compiler which generates high quality code with unrolling and software pipelining of innermost loops enabled. Larger performance improvements due to unrolling of nested loops can be expected on processors that have larger numbers of registers and larger degrees of instruction-level parallelism than the processor used for our measurements (PowerPC 604).

Cite

CITATION STYLE

APA

Sarkar, V. (2000). Optimized unrolling of nested loops. In Proceedings of the International Conference on Supercomputing (pp. 153–166). ACM. https://doi.org/10.1145/335231.335246

Optimized unrolling of nested loops

Abstract

Cite

Register to see more suggestions