Power Log'n'Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In fault tolerance for parallel and distributed systems, message logging protocols have played a prominent role in the last three decades. Such protocols enable local rollback to provide recovery from fail-stop errors. Global rollback techniques can be straightforward to implement but at times lead to slower recovery than local rollback. Local rollback is more complicated but can offer faster recovery times. In this work, we study the power and energy efficiency implications of global and local rollback. We propose a power-efficient version of local rollback to reduce power consumption for non-critical, blocked processes, using Dynamic Voltage and Frequency Scaling (DVFS) and clock modulation (CM). Our results for 3 different MPI codes on 2 parallel systems show that power-efficient local rollback reduces CPU energy waste up to 50% during the recovery phase, compared to existing global and local rollback techniques, without introducing significant overheads. Furthermore, we show that savings manifest for all blocked processes, which grow linearly with the process count. We estimate that for settings with high recovery overheads the total energy waste of parallel codes is reduced with the proposed local rollback.

Cite

CITATION STYLE

APA

Dichev, K., De Sensi, D., Nikolopoulos, D. S., Cameron, K. W., & Spence, I. (2022). Power Log’n’Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols. IEEE Transactions on Parallel and Distributed Systems, 33(6), 1276–1288. https://doi.org/10.1109/TPDS.2021.3107745

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free