Deriving optimal checkpoint protocols for distributed shared memory architectures

Lorenzo Alvisi; Keith Marzullo

Conference Proceedings

Deriving optimal checkpoint protocols for distributed shared memory architectures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1995) 938 111-120

DOI: 10.1007/3-540-60042-6_8

0Citations

14Readers

Get full text

Abstract

Uncoordinated checkpointing is one technique used to build processes that can recover to a consistent state after crashing. This technique requires each process to periodically record its state in a checkpoint. Furthermore, the threads executing on each process log any non-deterministic action that they take following the latest checkpointed state. When a process crashes, a new process, initialized with the appropriate recorded local state, is created in its place. The new process restarts executing, and whenever one of its threads confronts a non-deterministic choice, the thread references the log in order to reproduce the same action performed before the crash. Thus, uncoordinated checkpointing implements an abstraction of a resilient process in which the crash of a process is translated into intermittent unavailability of that process. We give a specification of the consistency property “no orphan threads” in the context of multithreaded processes running on a shared memory multiprocessor. We also give a definition of optimality for uncoordinated checkpointing protocols given a memory coherency protocol. We then use this specification to derive an existing uncoordinated checkpoint protocol and show that it is optimal. This protocol assumes that once a process crashes, no further processes crash until the first process completes recovery.

Cite

CITATION STYLE

APA

Alvisi, L., & Marzullo, K. (1995). Deriving optimal checkpoint protocols for distributed shared memory architectures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 938, pp. 111–120). Springer Verlag. https://doi.org/10.1007/3-540-60042-6_8

Deriving optimal checkpoint protocols for distributed shared memory architectures

Abstract

Cite

Register to see more suggestions