Deterministic fault injection of distributed systems

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers and system engineers. Meeting this challenge is particularly important for distributed systems with strict dependability and timeliness constraints. This paper presents a technique, called script-driven probing and fault injection, for the evaluation and validation of dependable protocols. The proposed approach can be used to demonstrate three aspects of a target protocol: i) detection of design or implementation errors, ii) identification of violations of protocol specifications, and iii) insight into design decisions made by the implementors. To demonstrate the capabilities of this technique, the paper briefly describes a probing and fault injection tool, called the PFI tool, and several experiments on two protocols: the Transmission Control Protocol (TCP) [4, 24] and the Group Membership Protocol (GMP) [19]. The tool can be used to delay, drop, reorder, duplicate, and modify messages. It can also introduce new messages into the system to probe participants. In the case of TCP, we used the PFI tool to duplicate the experiments reported in [7] on several TCP implementations without access to the vendors’ TCP source code in a very short time. We also ran several new experiments that are difficult to perform using past approaches based on packet monitoring and filtering. In the case of GMP, we used the tool to test the fault-tolerance capabilities of an implementation under various failure models including daemon/link crash, send/receive omissions, and timing failures. Furthermore, by selective reordering of messages and spontaneous transmission of new messages, we were able to guide a distributed computation into hard to reach global states without instrumenting the protocol implementation.

Cite

CITATION STYLE

APA

Dawson, S., & Jahanian, F. (1995). Deterministic fault injection of distributed systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 938, pp. 178–196). Springer Verlag. https://doi.org/10.1007/3-540-60042-6_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free