Most parallel systems on which MPI is used are now hierarchical, such as systems with SMP nodes. Many papers have shown algorithms that exploit shared memory to optimize collective operations to good effect. But how much of the performance benefit comes from tailoring the algorithm to the hierarchical topology of the system? We describe an implementation of many of the MPI collectives based entirely on message-passing primitives that exploits the two-level hierarchy. Our results show that exploiting shared memory directly usually gives small additional benefit and suggests design approaches for where the benefit is large. © 2009 Springer Berlin Heidelberg.
CITATION STYLE
Zhu, H., Goodell, D., Gropp, W., & Thakur, R. (2009). Hierarchical collectives in MPICH2. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5759 LNCS, pp. 325–326). Springer Verlag. https://doi.org/10.1007/978-3-642-03770-2_41
Mendeley helps you to discover research relevant for your work.