TY - JOUR
T1 - Maintainable Log Datasets for Evaluation of Intrusion Detection Systems
AU - Landauer, Max
AU - Skopik, Florian
AU - Frank, Maximilian
AU - Hotwagner, Wolfgang
AU - Wurzenberger, Markus
AU - Rauber, Andreas
PY - 2022
Y1 - 2022
N2 - Intrusion detection systems (IDS) monitor system logs and network traffic to recognize malicious activities in computer networks. Evaluating and comparing IDSs with respect to their detection accuracies is thereby essential for their selection in specific use-cases. Despite a great need, hardly any labeled intrusion detection datasets are publicly available. As a consequence, evaluations are often carried out on datasets from real infrastructures, where analysts cannot control system parameters or generate a reliable ground truth, or private datasets that prevent reproducibility of results. As a solution, we present a collection of maintainable log datasets collected in a testbed representing a small enterprise. Thereby, we employ extensive state machines to simulate normal user behavior and inject a multi-step attack. For scalable testbed deployment, we use concepts from model-driven engineering that enable automatic generation and labeling of an arbitrary number of datasets that comprise repetitions of attack executions with variations of parameters. In total, we provide 8 datasets containing 20 distinct types of log files, of which we label 8 files for 10 unique attack steps. We publish the labeled log datasets and code for testbed setup and simulation online as open-source to enable others to reproduce and extend our results.
AB - Intrusion detection systems (IDS) monitor system logs and network traffic to recognize malicious activities in computer networks. Evaluating and comparing IDSs with respect to their detection accuracies is thereby essential for their selection in specific use-cases. Despite a great need, hardly any labeled intrusion detection datasets are publicly available. As a consequence, evaluations are often carried out on datasets from real infrastructures, where analysts cannot control system parameters or generate a reliable ground truth, or private datasets that prevent reproducibility of results. As a solution, we present a collection of maintainable log datasets collected in a testbed representing a small enterprise. Thereby, we employ extensive state machines to simulate normal user behavior and inject a multi-step attack. For scalable testbed deployment, we use concepts from model-driven engineering that enable automatic generation and labeling of an arbitrary number of datasets that comprise repetitions of attack executions with variations of parameters. In total, we provide 8 datasets containing 20 distinct types of log files, of which we label 8 files for 10 unique attack steps. We publish the labeled log datasets and code for testbed setup and simulation online as open-source to enable others to reproduce and extend our results.
UR - https://doi.org/10.1109/TDSC.2022.3201582
U2 - 10.1109/TDSC.2022.3201582
DO - 10.1109/TDSC.2022.3201582
M3 - Article
SN - 1545-5971
SP - 1
EP - 17
JO - IEEE Transactions on Dependable and Secure Computing
JF - IEEE Transactions on Dependable and Secure Computing
ER -