Sign up & Download
Sign in

FFTs in External or Hierarchical Memory

by David H Bailey
Proceedings of the 1989 ACMIEEE Conference on Supercomputing Supercomputing 89 (1989)

Abstract

Conventional algorithms for computing large one dimensional Fast Fourier Transforms (FFTs), even those algorithms recently developed for vector and parallel computers, are largely unsuitable for systems with external or hierarchical memory. The principal reason for this is the fact that most FFT algorithms require at least m complete passes through the data set to compute a 2(sup m)-point FFT. This paper describes some advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms: (1) require as few as two passes through the external data set, (2) employ strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly two gigaflops.

Cite this document (BETA)

Available from hdl.handle.net
Page 1
hidden

FFTs in External or Hierarchical Memory

FFTs in External or Hierarchical Memory
David H. Bailey
December 30, 1989
Ref: Journal of Supercomputing, vol. 4, no. 1 (March 1990), p. 23{35
Abstract
Conventional algorithms for computing large one-dimensional fast Fourier transforms
(FFTs), even those algorithms recently developed for vector and parallel computers, are
largely unsuitable for systems with external or hierarchical memory. The principal reason
for this is the fact that most FFT algorithms require at least m complete passes through
the data set to compute a 2
m
-point FFT.
This paper describes some advanced techniques for computing an ordered FFT on a
computer with external or hierarchical memory. These algorithms (1) require as few as two
passes through the external data set, (2) employ strictly unit stride, long vector transfers
between main memory and external storage, (3) require only a modest amount of scratch
space in main memory, and (4) are well suited for vector and parallel computation.
Performance gures are included for implementations of some of these algorithms on
Cray supercomputers. Of interest is the fact that a main memory version outperforms the
current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP
systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at
nearly two giga
ops.
The author is with the Numerical Aerodynamic Simulation (NAS) Systems Division at
NASA Ames Research Center, Mo ett Field, CA 94035.
1

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

7 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
14% Doctoral Student
 
14% Student (Master)
 
14% Ph.D. Student
by Country
 
14% United Kingdom
 
14% China
 
14% Australia