Communication-aware approaches for transparent checkpointing in Cloud Computing

Samy Sadi; Belabbas Yagoubi

Journal ArticleOPEN ACCESS

Communication-aware approaches for transparent checkpointing in Cloud Computing

Scalable Computing (2016) 17(3) 251-270

DOI: 10.12694/scpe.v17i3.1184

4Citations

5Readers

Abstract

Checkpoint/Restart or checkpointing is a fault tolerance technique which consists on taking frequent snapshots of an application, so that, in the event of a failure, the application's state can be restored and the application's execution continued without necessarily restarting it. The advent of Cloud Computing brought new challenges with regard to this technique as Fault Tolerance needs to be supplied transparently in environments running highly heterogeneous applications. In this context, we propose two new fully transparent checkpointing approaches. Both approaches use communication-induced checkpointing and guarantee a consistent view of the applications with regard to the outside world process. The first approach is uncoordinated and creates checkpoints for applications independently. The second approach is coordinated, and applications are first grouped into clusters before the checkpointing process is started. We have compared the proposed approaches with state of the art approaches. The results show that our approaches perform better when considering the communication latencies, and the overhead on the execution of the Virtual Machines.

Author supplied keywords

Cite

CITATION STYLE

APA

Sadi, S., & Yagoubi, B. (2016). Communication-aware approaches for transparent checkpointing in Cloud Computing. Scalable Computing, 17(3), 251–270. https://doi.org/10.12694/scpe.v17i3.1184

Communication-aware approaches for transparent checkpointing in Cloud Computing

Abstract

Author supplied keywords

Cite

Register to see more suggestions