BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Song Chen; Hai Liao

Journal ArticleOPEN ACCESS

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Applied Artificial Intelligence (2022) 36(1)

DOI: 10.1080/08839514.2022.2145642

58Citations

64Readers

Abstract

Logs are primary information resource for fault diagnosis and anomaly detection in large-scale computer systems, but it is hard to classify anomalies from system logs. Recent studies focus on extracting semantic information from unstructured log messages and converting it into word vectors. Therefore, LSTM approach is more suitable for time series data. Word2Vec is the up-to-date encoding method, but the order of words in sequences is not taken into account. In this article, we propose BERT-Log, which regards the log sequence as a natural language sequence, use pre-trained language model to learn the semantic representation of normal and anomalous logs, and a fully connected neural network is utilized to fine-tune the BERT model to detect abnormal. It can capture all the semantic information from log sequence including context and position. It has achieved the highest performance among all the methods on HDFS dataset, with an F1-score of 99.3%. We propose a new log feature extractor on BGL dataset to obtain log sequence by sliding window including node ID, window size and step size. BERT-Log approach detects anomalies on BGL dataset with an F1-score of 99.4%. It gives 19% performance improvement compared to LogRobust and 7% performance improvement compared to HitAnomaly.

Cite

CITATION STYLE

APA

Chen, S., & Liao, H. (2022). BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model. Applied Artificial Intelligence, 36(1). https://doi.org/10.1080/08839514.2022.2145642

BERT-Log: Anomaly Detection for System Logs Based on Pre-trained Language Model

Abstract

Cite

Register to see more suggestions