Active optimistic message logging for reliable execution of MPI applications

4Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

To execute MPI applications reliably, fault tolerance mechanisms are needed. Message logging is a well known solution to provide fault tolerance for MPI applications. It as been proved that it can tolerate higher failure rate than coordinated checkpointing. However pessimistic and causal message logging can induce high overhead on failure free execution. In this paper, we present O2P, a new optimistic message logging protocol, based on active optimistic message logging. Contrary to existing optimistic message logging protocols that saves dependency information on reliable storage periodically, O2P logs dependency information as soon as possible to reduce the amount of data piggybacked on application messages. Thus it reduces the overhead of the protocol on failure free execution, making it more scalable and simplifying recovery. O2P is implemented as a module of the Open MPI library. Experiments show that active message logging is promising to improve scalability and performance of optimistic message logging. © 2009 Springer.

Cite

CITATION STYLE

APA

Ropars, T., & Morin, C. (2009). Active optimistic message logging for reliable execution of MPI applications. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5704 LNCS, pp. 615–626). https://doi.org/10.1007/978-3-642-03869-3_58

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free