Abstract
Transient hardware faults have become prevalent due to the shrinking size of transistors, leading to silent data corruptions (SDCs). Therefore, HPC applications need to be evaluated (e.g., via fault injections) and protected to meet the reliability target. In the evaluation, the target programs exercise with a set of given inputs which are usually from program benchmark suite. However, these inputs rarely manifest the SDC vulnerabilities, leading to over-optimistic assessment and unexpectedly higher failure rates in production.We propose Peppa-X, which efficiently identifies the test inputs that estimate the bound of program SDC resiliency. Our key insight is that the SDC sensitivity distribution in a program often remains stationary across input space. Thereby, we can guide the search of SDC-bound inputs by a sampled distribution. Our evaluation shows that Peppa-X can identify the SDC-bound input of a program that existing methods cannot find even with 5x more search time.
Author supplied keywords
Cite
CITATION STYLE
Rahman, M. H., Shamji, A., Guo, S., & Li, G. (2021). Peppa-X: Finding program test inputs to bound silent data corruption vulnerability in hpc applications. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. IEEE Computer Society. https://doi.org/10.1145/3458817.3476147
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.