Communication-aware approaches for transparent checkpointing in Cloud Computing

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Checkpoint/Restart or checkpointing is a fault tolerance technique which consists on taking frequent snapshots of an application, so that, in the event of a failure, the application's state can be restored and the application's execution continued without necessarily restarting it. The advent of Cloud Computing brought new challenges with regard to this technique as Fault Tolerance needs to be supplied transparently in environments running highly heterogeneous applications. In this context, we propose two new fully transparent checkpointing approaches. Both approaches use communication-induced checkpointing and guarantee a consistent view of the applications with regard to the outside world process. The first approach is uncoordinated and creates checkpoints for applications independently. The second approach is coordinated, and applications are first grouped into clusters before the checkpointing process is started. We have compared the proposed approaches with state of the art approaches. The results show that our approaches perform better when considering the communication latencies, and the overhead on the execution of the Virtual Machines.

Cite

CITATION STYLE

APA

Sadi, S., & Yagoubi, B. (2016). Communication-aware approaches for transparent checkpointing in Cloud Computing. Scalable Computing, 17(3), 251–270. https://doi.org/10.12694/scpe.v17i3.1184

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free