A transpose-free in-place SIMD optimized FFT

8Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

A transpose-free in-place SIMD optimized algorithm for the computation of large FFTs is introduced and implemented on the Cell Broadband Engine. Six different FFT implementations of the algorithm using six different data movement methods are described. Their relative performance is compared for input sizes from 217 to 221 complex floating point samples. Large differences in performance are observed among even theoretically equivalent data movement patterns. All six implementations compare favorably with FFTW and other previous FFT implementations. © 2012 ACM.

Cite

CITATION STYLE

APA

Geraci, J. R., & Sacco, S. M. (2012). A transpose-free in-place SIMD optimized FFT. Transactions on Architecture and Code Optimization, 9(3). https://doi.org/10.1145/2355585.2355596

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free