Optimization of fast fourier transforms on the blue gene/L supercomputer

7Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We analyze the bottlenecks in the parallel FFT algorithm and describe optimizations carried out for the algorithm on the Blue Gene/L Supercomputer. We identified three avenues for improving the performance of the algorithm - single-node FFT performance, Alltoall collective performance and overlap of computation and communication. Performance at all these levels has been optimized using the double-hummer intrinsics of the Blue Gene/L CPU, careful ordering and synchronization of messages in Alltoall communications and suitable interleaving of message exchanges with computations. Using these optimizations, we obtained 20% performance improvement over the baseline version on the 64 racks Blue Gene/L system. We give a brief overview of the Alltoall optimizations, describe our computation-communication overlap strategy and present results for strong scaling and weak scaling of parallel FFT on Blue Gene/L. We also discuss the fundamental limits to scaling of the parallel transpose algorithm for computing FFT. © 2008 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Sabharwal, Y., Garg, S. K., Garg, R., Gunnels, J. A., & Sahoo, R. K. (2008). Optimization of fast fourier transforms on the blue gene/L supercomputer. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5374 LNCS, pp. 309–322). Springer Verlag. https://doi.org/10.1007/978-3-540-89894-8_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free