Detecting and reproducing error-code propagation bugs in MPI implementations

11Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present an approach to automatically detect and reproduce error code propagation bugs in MPI implementations. Specifically, we combine static analysis and program repair for bug detection, and apply fault injection to reproduce error propagation bugs found in MPI libraries written in C. We demonstrate our approach on the MPICH library, one of the most popular implementations of MPI, and the MPICH-based implementation MVAPICH, uncovering 447 previously unknown bugs. We discovered that 31 of these bugs result in program crashes, and 60% of the MPICH test suite is susceptible to crashing due to failures to propagate error codes. Moreover, 95 bugs produce undesirable behavior that has been confirmed dynamically, causing tests to fail, hanging processes, or simply dropping error codes before reaching user applications.

Cite

CITATION STYLE

APA

DeFreez, D., Bhowmick, A., Laguna, I., & Rubio-González, C. (2020). Detecting and reproducing error-code propagation bugs in MPI implementations. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (pp. 187–201). Association for Computing Machinery. https://doi.org/10.1145/3332466.3374515

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free