Increasing failures from transient faults necessitates the costefficient protection mechanism that will be always activated. Thus, we propose a novel prediction-based transient fault protection strategy as a low-cost software-only technique. Instead of re-executing expensive computations for validation, an output prediction is used to cheaply determine an approximate value for a sequence of computation. When actual computation and prediction agree within a predefined acceptable range, the computation is assumed faultfree, and expensive re-computation can be skipped. With our approach, a significant reduction in dynamic instruction counts is possible. Missed faults may occur, but their occurrences can be explicitly kept to a small amount with a proper acceptable range. For evaluation, we build an automatic compilation system, called RSkip, that transforms a program into a resilient executable with the prediction-based protection scheme. Prior instruction replication work shows 2.33× execution time compared to the unreliable execution over nine compute-intensive benchmarks. With a control for the loss in protection rate, RSkip can reduce the protection overhead to 1.27× by skipping redundant computation in our target loops at a rate of 81.10%.
CITATION STYLE
Park, S., Li, S., Zhang, Z., & Mahlke, S. (2020). Low-cost prediction-based fault protection strategy. In CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization (pp. 30–42). Association for Computing Machinery, Inc. https://doi.org/10.1145/3368826.3377920
Mendeley helps you to discover research relevant for your work.