A critical optimization in the domain of linear signal transforms, such as the discrete Fourier transform (DFT), is loop merging, which increases data locality and reuse and thus performance. In particular, this includes the conversion of shuffle operations into array reindexings. To date, loop merging is well understood only for the DFT, and only for Cooley-Tukey FFT based algorithms, which excludes DFT sizes divisible by large primes. In this paper, we present a formal loop merging framework for general signal transforms and its implementation within the SPIRAL code generator. The framework consists of Ε-SPL, a mathematical language to express loops and index mappings; a rewriting system to merge loops in Ε-SPL and a compiler that translates Ε-SPL into code. We apply the framework to DFT sizes that cannot be handled using only the Cooley-Tukey FFT and compare our method to FFTW 3.0.1 and the vendor library Intel MKL 7.2.1. Compared to FFTW our generated code is a factor of 2--4 faster under equal implementation conditions (same algorithms, same unrolling threshold). For some sizes we show a speed-up of a factor of 9 using Bluestein's algorithm. Further, we give a detailed comparison against the Intel vendor library MKL; our generated code is between 2 times faster and 4.5 times slower.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below