Intrusion detection systems are essential for network security. To verify their detection capabilities and facilitate comparison, benchmark log datasets are used to measure evaluation metrics such as accuracy and false alarm rates. Thereby, it is necessary that these datasets come with a correct ground truth that differentiates normal and attacker behavior. While it is relatively straightforward to generate labels for network-based datasets by selecting events according to IP addresses of attacker hosts, system logs do not necessarily involve such identifiers and are possibly only recognizable as malicious by their combined occurrences. Even more problems emerge when log data is collected in model-driven testbeds, i.e., automatically generated networks that simulate differently parameterized attack scenarios in diverse infrastructures. In these testbeds, parameters such as IP addresses are subject to change and thus cannot simply be used for matching. We thus propose a framework that integrates template-based labeling rules for model-driven testbeds. In this paper we describe the syntax for rule templates with different query types specifically designed to match sequential or interrelated system log events. An evaluation of our open-source implementation shows that only 27 rules are necessary to assign 15 labels to 8 system log files containing attack manifestations.
CITATION STYLE
Landauer, M., Frank, M., Skopik, F., Hotwagner, W., Wurzenberger, M., & Rauber, A. (2022). A Framework for Automatic Labeling of Log Datasets from Model-driven Testbeds for HIDS Evaluation. In SaT-CPS 2022 - Proceedings of the 2022 ACM Workshop on Secure and Trustworthy Cyber-Physical Systems (pp. 77–86). Association for Computing Machinery, Inc. https://doi.org/10.1145/3510547.3517924
Mendeley helps you to discover research relevant for your work.