Flipit: An LLVM based fault injector for HPC

29Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

High performance computing (HPC) is increasingly subjected to faulty computations. The frequency of silent data corruptions (SDCs) in particular is expected to increase in emerging machines requiring HPC applications to handle SDCs. In this paper we, propose a robust fault injector structured through an LLVM compiler pass that allows simulation of SDCs in various applications. Although fault injection locations are enumerated at compile time, their activation is purely at runtime and based on a user-provided fault distribution. The robustness of our fault injector is in the ability to augment the runtime injection logic on a per application basis. This allows tighter control on the spacial, temporal, and probability of injected faults. The usability, scalability, and robustness of our fault injection is demonstrated with injecting faults into an algebraic multigird solver.

Cite

CITATION STYLE

APA

Calhoun, J., Olson, L., & Snir, M. (2014). Flipit: An LLVM based fault injector for HPC. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8805, pp. 547–558). Springer Verlag. https://doi.org/10.1007/978-3-319-14325-5_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free