Concurrent programs often encounter failures, such as races, owing to the presence of synchronization faults (bugs). One existing technique to tolerate synchronization faults is to roll back the program to a previous state andre -execute, in the hope that the failure does not recur. Insteadof relying on chance, our approach is to control the re-execution in order to avoid a recurrence of the synchronization failure. The control is achievedb y tracing information during an execution andu sing this information to add synchronizations during the re-execution. The approach gives rise to a general problem, calledt he off-line predicate control problem, which takes a computation anda property specified on the computation, andou tputs a “controlled” computation that maintains the property. We solve the predicate control problem for the mutual exclusion property, which is especially important in synchronization fault tolerance.
CITATION STYLE
Tarafdar, A., & Garg, V. K. (1999). Software fault tolerance of concurrent programs using controlled re-execution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1693, pp. 210–225). Springer Verlag. https://doi.org/10.1007/3-540-48169-9_15
Mendeley helps you to discover research relevant for your work.