An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors

31Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. The proposed parallel three-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 GFlops on 256 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 2563-point FFT. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Takahashi, D. (2010). An implementation of parallel 3-D FFT with 2-D decomposition on a massively parallel cluster of multi-core processors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6067 LNCS, pp. 606–614). https://doi.org/10.1007/978-3-642-14390-8_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free