Abstract
Checkpointing a virtual machine (VM) is a proven technique to improve the reliability in modern datacenters. Inspired by the CSMA protocol in wireless congestion control, we propose a novel framework for distributed and contention-free scheduling of VM checkpointing to offer reliability as a transparent, elastic service in datacenters. In this work, we quantify the reliability in closed form by studying system stationary behaviors, and maximize the job reliability through utility optimization. We implement a proof-of-concept prototype based on our design. Evaluation results show that the proposed checkpoint scheduling can significantly reduce the performance interference from checkpointing and improve reliability by as much as one order of magnitude over contention-oblivious scheme. © 2014 ACM.
Author supplied keywords
Cite
CITATION STYLE
Xiang, Y., Liu, H., Lan, T., Huang, H., & Subramaniam, S. (2014). Optimizing job reliability via contention-free, distributed scheduling of vm checkpointing. In DCC 2014 - Proceedings of the ACM SIGCOMM 2014 Workshop on Distributed Cloud Computing (pp. 59–63). Association for Computing Machinery. https://doi.org/10.1145/2627566.2627568
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.