In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. The proposed parallel three-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 GFlops on 256 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 2563-point FFT. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Takahashi, D. (2010). An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6067 LNCS, pp. 606–614). https://doi.org/10.1007/978-3-642-14390-8_63
Mendeley helps you to discover research relevant for your work.