Restoring consistent global states of distributed computations

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

We present a mechanism for restoring any consistent global state of a distributed computation. This capability can form the basis of support for rollback and replay of computations, an activity we view as essential in a comprehensive environment for debugging distributed programs. Our mechanism records occasional state checkpoints and logs all messages communicated between processes. Our mechanism offers flexibility in the following ways: any consistent global state of the computation can be restored; execution can be replayed either exactly as it occurred initially or with user-controlled variations; there is no need to know a priori what states might be of interest. In addition, if checkpoints and logs are written to stable storage, our mechanism can be used to restore states of computations that cause the system to crash.

Cite

CITATION STYLE

APA

Goldberg, A. P., Gopal, A., Lowry, A., & Strom, R. (1991). Restoring consistent global states of distributed computations. In Proceedings of the 1991 ACM/ONR Workshop on Parallel and Distributed Debugging, PADD 1991 (pp. 144–154). Association for Computing Machinery, Inc. https://doi.org/10.1145/122759.122772

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free