Optimizing job reliability via contention-free, distributed scheduling of vm checkpointing

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Checkpointing a virtual machine (VM) is a proven technique to improve the reliability in modern datacenters. Inspired by the CSMA protocol in wireless congestion control, we propose a novel framework for distributed and contention-free scheduling of VM checkpointing to offer reliability as a transparent, elastic service in datacenters. In this work, we quantify the reliability in closed form by studying system stationary behaviors, and maximize the job reliability through utility optimization. We implement a proof-of-concept prototype based on our design. Evaluation results show that the proposed checkpoint scheduling can significantly reduce the performance interference from checkpointing and improve reliability by as much as one order of magnitude over contention-oblivious scheme. © 2014 ACM.

Cite

CITATION STYLE

APA

Xiang, Y., Liu, H., Lan, T., Huang, H., & Subramaniam, S. (2014). Optimizing job reliability via contention-free, distributed scheduling of vm checkpointing. In DCC 2014 - Proceedings of the ACM SIGCOMM 2014 Workshop on Distributed Cloud Computing (pp. 59–63). Association for Computing Machinery. https://doi.org/10.1145/2627566.2627568

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free