Debugging distributed systems with why-across-time provenance

11Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

Systematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event’s causal history is an overapproximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon.

Cite

CITATION STYLE

APA

Whittaker, M., Alvaro, P., Teodoropol, C., & Hellerstein, J. M. (2018). Debugging distributed systems with why-across-time provenance. In SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing (pp. 333–346). Association for Computing Machinery, Inc. https://doi.org/10.1145/3267809.3267839

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free