Recovery by checkpointing on distributed shared memory systems is investigated in this paper. The notion of consistent global states on a sequentially consistent shared memory system is defined. We investigate how consistent checkpoints can be obtained in these systems. In addition, a novel lazy checkpointing approach is proposed. It allows a controlled degree of concurrency and, at the same time, limits the amount of rollback propagation during recovery. Correctness requirements for efficient checkpointing are explored first and algorithms satisfying the requirements are developed subsequently. Several interesting properties of checkpointing on distributed shared memory systems are discovered. In particular, we show that for low levels of laziness, one can achieve better concurrency with more stable storage.
CITATION STYLE
Choy, M., Leong, H. V., & Wong, M. H. (1995). On distributed object checkpointing and recovery. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (pp. 64–73). ACM. https://doi.org/10.1145/224964.224972
Mendeley helps you to discover research relevant for your work.