We explore the post-recovery efficiency of shrinking and nonshrinking recovery schemes on high performance computing systems using a synthetic benchmark. We study the impact of network topology on post-recovery communication performance. Our experiments on the IBM BG/Q System Mira show that shrinking recovery can deliver up to 7.5% better efficiency for neighbor communication pattern, as the non-shrinking recovery can reduce communication performance. We expected a similar situation for our synthetic benchmark with collective communication, but the situation is quite different. Both shrinking and non-shrinking recovery reduce MPI performance (MPICH3.1) dramatically on collective communication; up to 14× worse, swamping any differences between the two approaches. This suggests that making MPI performance less sensitive to irregularity in performance and communicator size are critical for both recovery approaches.
CITATION STYLE
Fang, A., Fujita, H., & Chien, A. A. (2015). Towards understanding Post-recovery efficiency for shrinking and non-shrinking recovery. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9523, pp. 656–668). Springer Verlag. https://doi.org/10.1007/978-3-319-27308-2_53
Mendeley helps you to discover research relevant for your work.