With the growing market of cloud-native applications, microservices architectures are widely used for rapid and automated deployments, scaling, and management. However, behind the prosperity of microservices, diagnosing faults in numerous services has brought great complexities to operators. To tackle this, we present a microservices troubleshooting framework called MicroCBR, which makes use of history faults from a knowledge base to construct spatio-temporal knowledge graph offline, and then troubleshoot online through case-based reasoning. Compared to existing frameworks, MicroCBR (1) takes advantage of heterogeneous data to fingerprint the faults, (2) carefully extracts a spatio-temporal knowledge graph with only one sample for each fault, (3) can handle novel faults through hierarchical reasoning, and incrementally update it to the fault knowledge base thanks to case-based reasoning paradigm. Our framework is explainable to operators, they can easily locate the root causes and refer to historical solutions. We also conduct three different microservices architectures with fault experiments on Grid’5000 testbed, the results show that MicroCBR achieves 91% top-1 accuracy, and outperforms three state-of-the-art methods. We report success stories in a real cloud platform and the code is open-sourced.
CITATION STYLE
Liu, F., Wang, Y., Li, Z., Ren, R., Guan, H., Yu, X., … Xie, G. (2022). MicroCBR: Case-Based Reasoning on Spatio-temporal Fault Knowledge Graph for Microservices Troubleshooting. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13405 LNAI, pp. 224–239). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-14923-8_15
Mendeley helps you to discover research relevant for your work.