CMDiagnostor: An Ambiguity-Aware Root Cause Localization Approach Based on Call Metric Data

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The availability of online services is vital as its strong relevance to revenue and user experience. To ensure online services' availability, quickly localizing the root causes of system failures is crucial. Given the high resource consumption of traces, call metric data are widely used by existing approaches to construct call graphs in practice. However, ambiguous correspondences between upstream and downstream calls may exist and result in exploring unexpected edges in the constructed call graph. Conducting root cause localization on this graph may lead to misjudgments of real root causes. To the best of our knowledge, we are the first to investigate such ambiguity, which is overlooked in the existing literature. Inspired by the law of large numbers and the Markov properties of network traffic, we propose a regression-based method (named AmSitor) to address this problem effectively. Based on AmSitor, we propose an ambiguity-aware root cause localization approach based on Call Metric Data named CMDiagnostor, containing metric anomaly detection, ambiguity-free call graph construction, root cause exploration, and candidate root cause ranking modules. The comprehensive experimental evaluations conducted on real-world datasets show that our CMDiagnostor can outperform the state-of-the-art approaches by 14% on the top-5 hit rate. Moreover, AmSitor can also be applied to existing baseline approaches separately to improve their performances one step further. The source code is released at https://github.com/NetManAIOps/CMDiagnostor.

Cite

CITATION STYLE

APA

Yu, Q., Pei, C., Hao, B., Li, M., Li, Z., Zhang, S., … Pei, D. (2023). CMDiagnostor: An Ambiguity-Aware Root Cause Localization Approach Based on Call Metric Data. In ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023 (pp. 2937–2947). Association for Computing Machinery, Inc. https://doi.org/10.1145/3543507.3583302

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free