The microservice architecture is widely employed in large Internet systems. For each user request, a few of the microservices are called, and a trace is formed to record the tree-like call dependencies among microservices and the time consumption at each call node. Traces are useful in diagnosing system failures, but their complex structures make it difficult to model their patterns and detect their anomalies. In this paper, we propose a novel dual-variable graph variational autoencoder (VAE) for unsupervised anomaly detection on microservice traces. To reconstruct the time consumption of nodes, we propose a novel dispatching layer. We find that the inversion of negative log-likelihood (NLL) appears for some anomalous samples, which makes the anomaly score infeasible for anomaly detection. To address this, we point out that the NLL can be decomposed into KL-divergence and data entropy, whereas lower-dimensional anomalies can introduce an entropy gap with normal inputs. We propose three techniques to mitigate this entropy gap for trace anomaly detection: Bernoulli & Categorical Scaling, Node Count Normalization, and Gaussian Std-Limit. On five trace datasets from a top Internet company, our proposed TraceVAE achieves excellent F-scores.
CITATION STYLE
Xie, Z., Xu, H., Chen, W., Li, W., Jiang, H., Su, L., … Pei, D. (2023). Unsupervised Anomaly Detection on Microservice Traces through Graph VAE. In ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023 (pp. 2874–2884). Association for Computing Machinery, Inc. https://doi.org/10.1145/3543507.3583215
Mendeley helps you to discover research relevant for your work.