Anomaly Detection (AD) in distributed cloud systems is the process of identifying unexpected (i.e. anomalous) behaviour. Many approaches from machine learning to statistical methods exist to detect anomalous data instances. However, no generic solutions exist for identifying appropriate metrics for monitoring and choosing adequate detection approaches. In this paper, we present the CRI-Model (Change, Rupture, Impact), which is a taxonomy based on a study of anomaly types in the literatureand an analysis of system outages in major cloud and web-portal companies. The taxonomy can be used as an anlaysis-tool on identified anomalies to discover gaps in the AD state of a system or determine components most often affected by a particular anomaly type. While the dimensions of the taxonomy are fixed, the categories can be adapted to different domains. We show the applicability of the taxonomy to distributed cloud systems using a large dataset of anomaly reports from a software company. The adaptability is further shown for the production automation domain, as a first attempt to generalize the taxonomy to other distributed systems.
Reichert, K., Pokahr, A., Hohenberger, T., Haubeck, C., & Lamersdorf, W. (2017). A taxonomy of anomalies in distributed cloud systems: The CRI-model. Studies in Computational Intelligence, 737, 247–261. https://doi.org/10.1007/978-3-319-66379-1_22