An implementation of parallel 3-D FFT using short vector SIMP instructions on clusters of PCs

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) using short vector SIMD instructions on clusters of PCs. We vectorized FFT kernels using Intel's Streaming SIMD Extensions 2 (SSE2) instructions. We show that a combination of the vectorization and block three-dimensional FFT algorithm improves performance effectively. Performance results of three-dimensional FFTs on a dual Xeon 2.8 GHz PC SMP cluster are reported. We successfully achieved performance of over 5 GFLOPS on a 16-node dual Xeon 2.8 GHz PC SMP cluster. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Takahashi, D., Boku, T., & Sato, M. (2006). An implementation of parallel 3-D FFT using short vector SIMP instructions on clusters of PCs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3732 LNCS, pp. 1159–1167). https://doi.org/10.1007/11558958_139

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free