The fault detection problem

Andreas Haeberlen; Petr Kuznetsov

Conference Proceedings

The fault detection problem

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5923 LNCS 99-114

DOI: 10.1007/978-3-642-10877-8_10

16Citations

23Readers

Get full text

Abstract

One of the most important challenges in distributed computing is ensuring that services are correct and available despite faults. Recently it has been argued that fault detection can be factored out from computation, and that a generic fault detection service can be a useful abstraction for building distributed systems. However, while fault detection has been extensively studied for crash faults, little is known about detecting more general kinds of faults. This paper explores the power and the inherent costs of generic fault detection in a distributed system. We propose a formal framework that allows us to partition the set of all faults that can possibly occur in a distributed computation into several fault classes. Then we formulate the fault detection problem for a given fault class, and we show that this problem can be solved for only two specific fault classes, namely omission faults and commission faults. Finally, we derive tight lower bounds on the cost of solving the problem for these two classes in asynchronous message-passing systems. © 2009 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Haeberlen, A., & Kuznetsov, P. (2009). The fault detection problem. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5923 LNCS, pp. 99–114). https://doi.org/10.1007/978-3-642-10877-8_10

The fault detection problem

Abstract

Author supplied keywords

Cite

Register to see more suggestions