Software fault tolerance of concurrent programs using controlled re-execution

8Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Concurrent programs often encounter failures, such as races, owing to the presence of synchronization faults (bugs). One existing technique to tolerate synchronization faults is to roll back the program to a previous state andre -execute, in the hope that the failure does not recur. Insteadof relying on chance, our approach is to control the re-execution in order to avoid a recurrence of the synchronization failure. The control is achievedb y tracing information during an execution andu sing this information to add synchronizations during the re-execution. The approach gives rise to a general problem, calledt he off-line predicate control problem, which takes a computation anda property specified on the computation, andou tputs a “controlled” computation that maintains the property. We solve the predicate control problem for the mutual exclusion property, which is especially important in synchronization fault tolerance.

Cite

CITATION STYLE

APA

Tarafdar, A., & Garg, V. K. (1999). Software fault tolerance of concurrent programs using controlled re-execution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1693, pp. 210–225). Springer Verlag. https://doi.org/10.1007/3-540-48169-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free