Debugging large-scale, long-running parallel programs

Dieter Kranzlmüller; Nam Thoai; Jens Volkert

Journal ArticleOPEN ACCESS

Debugging large-scale, long-running parallel programs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2330 913-922

DOI: 10.1007/3-540-46080-2_96

0Citations

1Readers

Abstract

Cyclic debugging depicts error detection techniques, where programs are iteratively executed to identify the original reason for incorrect runtime behavior. This characteristic is especially problematic for large-scale, long-running parallel programs concerning the requirements in time and processing resources and the associated computing costs. A solution to these problems is offered by a combination of techniques, which use the event graph model as the main representation of parallel program behavior. On the one hand, the number of deployed processes can be reduced with process isolation, where only a subset of the original processes are executed during debugging. On the other hand, an integrated checkpointing mechanism allows to extract limited periods of execution time, or to start subsequent program executions at intermediate points. Additionally, the event graph offers equivalent program execution in case of nondeterminism, as well as the possibility to investigate the effects of program perturbation induced by the observation functionality. © Springer-Verlag Berlin Heidelberg 2002.

Cite

CITATION STYLE

APA

Kranzlmüller, D., Thoai, N., & Volkert, J. (2002). Debugging large-scale, long-running parallel programs. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2330, 913–922. https://doi.org/10.1007/3-540-46080-2_96

Debugging large-scale, long-running parallel programs

Abstract

Cite

Register to see more suggestions