In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) using short vector SIMD instructions on clusters of PCs. We vectorized FFT kernels using Intel's Streaming SIMD Extensions 2 (SSE2) instructions. We show that a combination of the vectorization and block three-dimensional FFT algorithm improves performance effectively. Performance results of three-dimensional FFTs on a dual Xeon 2.8 GHz PC SMP cluster are reported. We successfully achieved performance of over 5 GFLOPS on a 16-node dual Xeon 2.8 GHz PC SMP cluster. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Takahashi, D., Boku, T., & Sato, M. (2006). An implementation of parallel 3-D FFT using short vector SIMP instructions on clusters of PCs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3732 LNCS, pp. 1159–1167). https://doi.org/10.1007/11558958_139
Mendeley helps you to discover research relevant for your work.