Parallel prefix (scan) algorithms for MPI

26Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe and experimentally compare four theoretically well-known algorithms for the parallel prefix operation (scan, in MPI terms), and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. The doubly-pipelined algorithm is more than a factor two faster than the straight-forward binomial-tree algorithm found in many MPI implementations. However, due to its small constant factors the simple, linear pipeline algorithm is preferable for systems with a moderate number of processors. We also discuss adapting the algorithms to clusters of SMP nodes. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Sanders, P., & Träff, J. L. (2006). Parallel prefix (scan) algorithms for MPI. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4192 LNCS, pp. 49–57). Springer Verlag. https://doi.org/10.1007/11846802_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free