Systematic debugging of concurrent systems using coalesced stack trace graphs

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A central need during software development of large-scale parallel systems is tools that help to quickly identify the root causes of bugs. Given the massive scale of these systems, tools that highlight changes—say introduced across software versions or their operating conditions (e.g., inputs, schedules)—can prove to be highly effective in practice. Conventional debuggers, while good at presenting details at the problem-site (e.g., crash), often omit contextual information to identify the root causes of the bug. We present a new approach to collect and coalesce stack traces, leading to an efficient summary display of salient system control flow differences in a graphical form called Coalesced Stack Trace Graphs (CSTG). CSTGs have helped us debug situations within a computational framework called Uintah that has been deployed at very large scale. In this paper, we detail CSTGs through case studies in the context of Uintah where unexpected behaviors caused by different versions of software or occurring across different time-steps of a system (e.g., due to non-determinism) are debugged. We show that CSTG also gives conventional debuggers a far more productive and guided role to play.

Cite

CITATION STYLE

APA

de Oliveira, D. C. B., Rakamarić, Z., Gopalakrishnan, G., Humphrey, A., Meng, Q., & Berzins, M. (2015). Systematic debugging of concurrent systems using coalesced stack trace graphs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8967, pp. 317–331). Springer Verlag. https://doi.org/10.1007/978-3-319-17473-0_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free