Novel architectures leveraging long and variable vector lengths like the NEC SX-Aurora or the vector extension of RISCV are appearing as promising solutions on the supercomputing market. These architectures often require re-coding of scientific kernels. For example, traditional implementations of algorithms for computing the fast Fourier transform (FFT) cannot take full advantage of vector architectures. In this article, we present the implementation of FFT algorithms able to leverage these novel architectures. We evaluate these codes on NEC SX-Aurora, comparing them with the optimized NEC libraries; and in a prototype of a RISC-V core with a vector processing unit. We present the benefits and limitations of two approaches of RADIX-2 FFT vector implementations. We show that our approach makes better use of the vector unit of the NEC SX-Aurora, reaching higher or equal performance than the optimized NEC library. More generally, we prove the importance of maximizing the vector length usage of the algorithm, taking advantage of the FFT properties to reduce long-latency vector operations, and reordering the instructions according to the specific hardware features to boost the performance of FFT-like computational kernels.
CITATION STYLE
Vizcaino, P., Mantovani, F., Ferrer, R., & Labarta, J. (2023). Acceleration with long vector architectures: Implementation and evaluation of the FFT kernel on NEC SX-Aurora and RISC-V vector extension. In Concurrency and Computation: Practice and Experience (Vol. 35). John Wiley and Sons Ltd. https://doi.org/10.1002/cpe.7424
Mendeley helps you to discover research relevant for your work.