The online detection of anomalies is a vital element of operations in data centers and in utility clouds like Amazon EC2. Given ever-increasing data center sizes coupled with the complexities of systems software, applications, and workload patterns, such anomaly detection must operate automatically, at runtime, and without the need for prior knowledge about normal or anomalous behaviors. Further, detection should function for different levels of abstraction like hardware and software, and for the multiple metrics used in cloud computing systems. This paper proposes EbAT - Entropy-based Anomaly Testing - offering novel methods that detect anomalies by analyzing for arbitrary metrics their distributions rather than individual metric thresholds. Entropy is used as a measurement that captures the degree of dispersal or concentration of such distributions, aggregating raw metric data across the cloud stack to form entropy time series. For scalability, such time series can then be combined hierarchically and across multiple cloud subsystems. Experimental results on utility cloud scenarios demonstrate the viability of the approach. EbAT outperforms threshold-based methods with on average 57.4% improvement in accuracy of anomaly detection and also does better by 59.3% on average in false alarm rate with a `near-optimum' threshold-based method.
Wang, C., Talwar, V., Schwan, K., & Ranganathan, P. (2010). Online detection of utility cloud anomalies using metric distributions. In Proceedings of the 2010 IEEE/IFIP Network Operations and Management Symposium, NOMS 2010 (pp. 96–103). IEEE Computer Society. https://doi.org/10.1109/NOMS.2010.5488443