TY - JOUR
T1 - Analysis of statistical properties of variables in log data for advanced anomaly detection in cyber security
AU - Wurzenberger, Markus
AU - Höld, Georg
AU - Landauer, Max
AU - Skopik, Florian
PY - 2024/2
Y1 - 2024/2
N2 - Log lines consist of static parts that characterize their structure and enable assignment of event types, and event parameters, i.e., variable parts that provide specific information on system processes, such as host and user names, IP addresses, and file operations. Many detection approaches only focus on anomalous event type occurrences, i.e., they parse log lines to derive unique event identifiers and subsequently detect anomalies in event sequences or event count vectors, but neglect variable parts of log lines entirely during analysis. This is especially problematic, when monitoring strongly structured log data that contains only a small number of distinct event types, for example, logs that consist of strict key value pairs, i.e., parameters that occur consistently throughout all log lines, such as it is case in access and audit logs. Thus, novel approaches are required, which focus on analysis of log lines' variable parts. In this paper, we propose the variable type detector (VTD), a novel unsupervised approach that autonomously analyzes variable log line parts to enable anomaly detection. It assigns data types to each variable, which also include probability distributions for discrete and continuous variables. The VTD raises an alarm if a variable's data type changes. Furthermore, it implements a robust indicator function that reduces false positives by tracking the data type history of each variable and reports only significant data type changes. Additionally, an event indicator enables event-based anomaly detection by taking into account the data types of all variables of a single event type. The evaluation conducted on open-source log data, demonstrates the effectiveness of the VTD compared to conventional anomaly detection approaches, such as time series analysis and PCA. Consequently, the VTD acts as a solution that extends the intrusion detection capabilities of security information and event management (SIEM) and integrates with modern concepts of endpoint detection and response (EDR) and extended detection and responses (XDR), while simultaneously serving as an asset for process monitoring that supports user and entity behavior analytics (UEBA).
AB - Log lines consist of static parts that characterize their structure and enable assignment of event types, and event parameters, i.e., variable parts that provide specific information on system processes, such as host and user names, IP addresses, and file operations. Many detection approaches only focus on anomalous event type occurrences, i.e., they parse log lines to derive unique event identifiers and subsequently detect anomalies in event sequences or event count vectors, but neglect variable parts of log lines entirely during analysis. This is especially problematic, when monitoring strongly structured log data that contains only a small number of distinct event types, for example, logs that consist of strict key value pairs, i.e., parameters that occur consistently throughout all log lines, such as it is case in access and audit logs. Thus, novel approaches are required, which focus on analysis of log lines' variable parts. In this paper, we propose the variable type detector (VTD), a novel unsupervised approach that autonomously analyzes variable log line parts to enable anomaly detection. It assigns data types to each variable, which also include probability distributions for discrete and continuous variables. The VTD raises an alarm if a variable's data type changes. Furthermore, it implements a robust indicator function that reduces false positives by tracking the data type history of each variable and reports only significant data type changes. Additionally, an event indicator enables event-based anomaly detection by taking into account the data types of all variables of a single event type. The evaluation conducted on open-source log data, demonstrates the effectiveness of the VTD compared to conventional anomaly detection approaches, such as time series analysis and PCA. Consequently, the VTD acts as a solution that extends the intrusion detection capabilities of security information and event management (SIEM) and integrates with modern concepts of endpoint detection and response (EDR) and extended detection and responses (XDR), while simultaneously serving as an asset for process monitoring that supports user and entity behavior analytics (UEBA).
KW - Intrusion detection
KW - Log analysis
KW - Anomaly detection
UR - https://doi.org/10.1016/j.cose.2023.103631
U2 - 10.1016/j.cose.2023.103631
DO - 10.1016/j.cose.2023.103631
M3 - Article
SN - 0167-4048
VL - 137
JO - Computers & Security
JF - Computers & Security
M1 - 103631
ER -