SC4OSINT: A Story Clustering Approach to Optimize OSINT Analysis

Publikation: Beitrag in Buch oder TagungsbandVortrag mit Beitrag in TagungsbandBegutachtung

Abstract

Cyber Threat Intelligence (CTI) has become an indispensable element of cybersecurity operations. As a result, any mechanism or tool that alleviates the workload of security analysts is highly valuable to the cybersecurity community. Recent advancements in Natural Language Processing (NLP) enable efficient processing of news articles, leading us to propose a clustering approach based on the reported story. This allows Open Source Intelligence (OSINT) analysts to manage information overload and focus only on essential events. Therefore, the contributions of this paper are manyfold: (i) We identify the relevant requirements for designing an OSINT clustering tool, (ii) present a solution that can support such requirements, and (iii) evaluate the solution considering the needs of OSINT analysts. Our clustering approach, denoted as SC4OSINT, is inspired by an existing semi-supervised graph-based story clustering method and adapts it to the OSINT requirements. Unlike the original method, SC4OSINT is a fully unsupervised two-layer approach, which handles multilingual streaming data and uses sentence transformers to create fine-grained clusters. Additionally, it uses an unsupervised method to extract keywords, prioritize those relevant to the OSINT domain, and form keyword communities. These communities are further refined using document cosine similarity, a more efficient alternative to the pairwise document comparisons in the existing approach. We evaluate the SC4OSINT’s story clustering approach by having security experts rate the clustering quality across various model configurations. The results show that the best configuration achieves an average rating of 4.19/5, demonstrating the efficiency of our approach.
OriginalspracheEnglisch
TitelAvailability, Reliability and Security
UntertitelARES 2025 International Workshops, Ghent, Belgium, August 11–14, 2025, Proceedings, Part II
Seiten5–24
Band15995
ISBN (elektronisch)978-3-032-00633-2
PublikationsstatusVeröffentlicht - 2025
VeranstaltungARES 2025 International Workshops - Ghent, Ghent, Belgien
Dauer: 11 Aug. 202514 Aug. 2025

Publikationsreihe

NameLecture Notes in Computer Science
Band15995
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Workshop

WorkshopARES 2025 International Workshops
Land/GebietBelgien
StadtGhent
Zeitraum11/08/2514/08/25

Research Field

  • Cyber Security
  • Multimodal Analytics

Fingerprint

Untersuchen Sie die Forschungsthemen von „SC4OSINT: A Story Clustering Approach to Optimize OSINT Analysis“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren