Abstract
As threats to computer system security continue to grow, Intrusion Detection Systems~(IDS) are becoming increasingly important. Among these, anomaly-based IDS use statistical analysis or machine learning techniques to model normal system behavior and detect deviations that indicate potentially malicious events. Specifically, log-based anomaly detection is gaining importance, as it utilizes system log data, which capture a variety of system events. Thus, log-based anomaly detection allows for effective and automated detection of malicious activities. As shown in recent studies, non-deep learning models, like Bigram detectors, can achieve anomaly detection performance comparable to that of deep learning approaches. However, if there are insufficient data or resources, both computational and human, model training can be challenging. To alleviate this issue, transfer learning, which reuses knowledge of pre-trained models to address new but similar problems, becomes valuable. This thesis applies domain adaptation via Transfer Component Analysis (TCA), a non-deep feature-based transfer learning approach. To validate the transfer learning performance, different anomaly detection models and publicly available log datasets are utilized for a comparative study. More specifically, this thesis leverages a Principal Component Analysis model, an Event Sequence Detector, and an Event Count Vector Clustering Detector to conduct a series of structured evaluations. For this, the BlueGene/L dataset and two subsets of the Thunderbird dataset, each corresponding to supercomputer systems, are utilized. Moreover, transfer learning is applied between the two Thunderbird subsets. Findings demonstrate that this approach can achieve performance levels similar to those of deep transfer learning methods reported in the literature, reaching F1 scores of 0.94. In general, this thesis demonstrates the impact of the detection models, dataset preprocessing, and sequencing methods on the transferability of anomaly detection knowledge across different datasets. Finally, this thesis discusses the choice of datasets and issues with comparability to results in the existing literature.
Originalsprache | Englisch |
---|---|
Qualifikation | Master of Science |
Gradverleihende Hochschule |
|
Betreuer/-in / Berater/-in |
|
Datum der Bewilligung | 16 Okt. 2024 |
Publikationsstatus | Veröffentlicht - 16 Okt. 2024 |
Research Field
- Cyber Security