We describe and experimentally compare four theoretically well-known algorithms for the parallel prefix operation (scan, in MPI terms), and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. The doubly-pipelined algorithm is more than a factor two faster than the straight-forward binomial-tree algorithm found in many MPI implementations. However, due to its small constant factors the simple, linear pipeline algorithm is preferable for systems with a moderate number of processors. We also discuss adapting the algorithms to clusters of SMP nodes. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Sanders, P., & Träff, J. L. (2006). Parallel prefix (scan) algorithms for MPI. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4192 LNCS, pp. 49–57). Springer Verlag. https://doi.org/10.1007/11846802_15
Mendeley helps you to discover research relevant for your work.