Systematically reasoning about the fine-grained causes of events in a real-world distributed system is challenging. Causality, from the distributed systems literature, can be used to compute the causal history of an arbitrary event in a distributed system, but the event’s causal history is an overapproximation of the true causes. Data provenance, from the database literature, precisely describes why a particular tuple appears in the output of a relational query, but data provenance is limited to the domain of static relational databases. In this paper, we present wat-provenance: a novel form of provenance that provides the benefits of causality and data provenance. Given an arbitrary state machine, wat-provenance describes why the state machine produces a particular output when given a particular input. This enables system developers to reason about the causes of events in real-world distributed systems. We observe that automatically extracting the wat-provenance of a state machine is often infeasible. Fortunately, many distributed systems components have simple interfaces from which a developer can directly specify wat-provenance using a technique we call wat-provenance specifications. Leveraging the theoretical foundations of wat-provenance, we implement a prototype distributed debugging framework called Watermelon.
CITATION STYLE
Whittaker, M., Alvaro, P., Teodoropol, C., & Hellerstein, J. M. (2018). Debugging distributed systems with why-across-time provenance. In SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing (pp. 333–346). Association for Computing Machinery, Inc. https://doi.org/10.1145/3267809.3267839
Mendeley helps you to discover research relevant for your work.