Performance is a key factor for big data applications, and much research has been devoted to optimizing these applications. While prior work can diagnose and correct data skew, the problem of computation skew-Abnormally high computation costs for a small subset of input data-has been largely overlooked. Computation skew commonly occurs in real-world applications and yet no tool is available for developers to pinpoint underlying causes. To enable a user to debug applications that exhibit computation skew, we develop a post-mortem performance debugging tool. PerfDebug automatically finds input records responsible for such abnormalities in a big data application by reasoning about deviations in performance metrics such as job execution time, garbage collection time, and serialization time. The key to PerfDebug's success is a data provenance-based technique that computes and propagates record-level computation latency to keep track of abnormally expensive records throughout the pipeline. Finally, the input records that have the largest latency contributions are presented to the user for bug fixing. We evaluate PerfDebug via in-depth case studies and observe that remediation such as removing the single most expensive record or simple code rewrite can achieve up to 16X performance improvement.
CITATION STYLE
Teoh, J., Gulzar, M. A., Xu, G. H., & Kim, M. (2019). PerfDebug: Performance Debugging of Computation Skew in Dataflow Systems. In SoCC 2019 - Proceedings of the ACM Symposium on Cloud Computing (pp. 465–476). Association for Computing Machinery. https://doi.org/10.1145/3357223.3362727
Mendeley helps you to discover research relevant for your work.