MPI support for multi-core architectures: Optimized shared memory collectives

Richard L. Graham; Galen Shipman

Conference Proceedings

MPI support for multi-core architectures: Optimized shared memory collectives

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5205 LNCS 130-140

DOI: 10.1007/978-3-540-87475-1_21

58Citations

22Readers

Get full text

Abstract

With local core counts on the rise, taking advantage of shared-memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI_Bcast, MPI_Reduce, and MPI_Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization costs fan-in/fan-out algorithms generally perform best. For large messages data manipulation constitute the largest cost and reduce-scatter algorithms are best for reductions. These optimization improve performance by up to a factor of three. Memory and cache sharing effect require deliberate process layout and careful radix selection for tree-based methods. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Graham, R. L., & Shipman, G. (2008). MPI support for multi-core architectures: Optimized shared memory collectives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5205 LNCS, pp. 130–140). https://doi.org/10.1007/978-3-540-87475-1_21

MPI support for multi-core architectures: Optimized shared memory collectives

Abstract

Author supplied keywords

Cite

Register to see more suggestions