This paper describes an algorithm for scalar replacement, which replaces repeated accesses to an array element with a scalar temporary. The element is accessed from a register rather than memory, thereby eliminating unnecessary memory accesses. A previous approach to this problem combines scalar replacement with a loop transformation called unroll-and-jam, whereby outer loops in a nest are unrolled, and the resulting duplicate inner loop bodies are fused together. The effect of unroll-and-jam is to bring opportunities for scalar replacement into inner loop bodies. In this paper, we describe an alternative approach that can exploit reuse opportunities across multiple loops in a nest, and without requiring unroll-and-jam. We also use this technique to eliminate unnecessary writes back to memory. The approach described in this paper is particularly well-suited to architectures with large register files and efficient mechanisms for registcr-to-register transfer. From our experimental results mapping 5 multimedia kernels to an FPGA platform, assuming 32 registers, we observe a 58 to 90 percent of reduction in memory accesses and speedup 2.34 to 7.31 over original programs. © Springer-Verlag 2001.
CITATION STYLE
So, B., & Hall, M. (2004). Increasing the applicability of scalar replacement. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2985, 185–201. https://doi.org/10.1007/978-3-540-24723-4_13
Mendeley helps you to discover research relevant for your work.