Parallel prefix (scan) algorithms for MPI

Peter Sanders; Jesper Larsson Träff

Conference Proceedings

Parallel prefix (scan) algorithms for MPI

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4192 LNCS 49-57

DOI: 10.1007/11846802_15

26Citations

15Readers

Get full text

Abstract

We describe and experimentally compare four theoretically well-known algorithms for the parallel prefix operation (scan, in MPI terms), and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. The doubly-pipelined algorithm is more than a factor two faster than the straight-forward binomial-tree algorithm found in many MPI implementations. However, due to its small constant factors the simple, linear pipeline algorithm is preferable for systems with a moderate number of processors. We also discuss adapting the algorithms to clusters of SMP nodes. © Springer-Verlag Berlin Heidelberg 2006.

Author supplied keywords

Cite

CITATION STYLE

APA

Sanders, P., & Träff, J. L. (2006). Parallel prefix (scan) algorithms for MPI. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4192 LNCS, pp. 49–57). Springer Verlag. https://doi.org/10.1007/11846802_15

Parallel prefix (scan) algorithms for MPI

Abstract

Author supplied keywords

Cite

Register to see more suggestions