Abstract
While deep learning (DL) has emerged as a powerful technique, its benefits must be carefully considered in relation to computational costs. Specifically, although DL methods have achieved strong performance in log anomaly detection, they often require extended time for log preprocessing, model training, and model inference, hindering their adoption in online distributed cloud systems that require rapid deployment of log anomaly detection service. This paper investigates the superiority of DL methods compared to simpler techniques in log anomaly detection. We evaluate basic algorithms (e.g., KNN, SLFN) and DL approaches (e.g., CNN) on five public log anomaly detection datasets (e.g., HDFS). Our findings demonstrate that simple algorithms outperform DL methods in both time efficiency and accuracy. For instance, on the Thunderbird dataset, the K-nearest neighbor algorithm trains 1,000 times faster than NeuralLog while achieving a higher F1-Score by 0.0625. We also identify three factors contributing to this phenomenon, which are: (1) redundant log preprocessing strategies, (2) dataset simplicity, and (3) the nature of binary classification in log anomaly detection. To assess the necessity of DL, we propose LightAD, an architecture that optimizes training time, inference time, and performance score. With automated hyper-parameter tuning, LightAD allows fair comparisons among log anomaly detection models, enabling engineers to evaluate the suitability of complex DL methods. Our findings serve as a cautionary tale for the log anomaly detection community, highlighting the need to critically analyze datasets and research tasks before adopting DL approaches. Researchers proposing computationally expensive models should benchmark their work against lightweight algorithms to ensure a comprehensive evaluation.
Author supplied keywords
Cite
CITATION STYLE
Yu, B., Zhong, Z., Yao, J., Xie, H., Fu, Q., Wu, Y., … He, P. (2024). Deep Learning or Classical Machine Learning? An Empirical Study on Log-Based Anomaly Detection. In Proceedings - International Conference on Software Engineering. IEEE Computer Society. https://doi.org/10.1145/3597503.3623308
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.