CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems

128Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Modern applications especially cloud-based or cloud-centric applications always have many components running in the large distributed environment with complex interactions. They are vulnerable to suffer from performance or availability problems due to the highly dynamic runtime environment such as resource hogs, configuration changes and software bugs. In order to make efficient software maintenance and provide some hints to software bugs, we build a system named CauseInfer, a low cost and blackbox cause inference system without instrumenting the application source code. CauseInfer can automatically construct a two layered hierarchical causality graph and infer the causes of performance problems along the causal paths in the graph with a series of statistical methods. According to the experimental evaluation in the controlled environment, we find out CauseInfer can achieve an average 80% precision and 85 % recall in a list of top two causes to identify the root causes, higher than several state-of-the-art methods and a good scalability to scale up in the distributed systems. © 2014 IEEE.

Cite

CITATION STYLE

APA

Chen, P., Qi, Y., Zheng, P., & Hou, D. (2014). CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems. In Proceedings - IEEE INFOCOM (pp. 1887–1895). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/INFOCOM.2014.6848128

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free