DeepD2V - Deep Learning and Domain Word Embeddings for DGA based Malware Detection

Lucas Torrealba, Pedro Casas-Hernandez (Autor:in und Vortragende:r), Javier Bustos-Jiménez, Germán Capdehourat, Mislav Findrik

Publikation: Beitrag in Buch oder TagungsbandVortrag mit Beitrag in TagungsbandBegutachtung

Abstract

The rapid detection of Domain Generation Algorithm (DGA) domains plays a critical role in mitigating malware propagation and its potential impact, as well as in limiting botnet activity coordination through command and control (C&C) servers. We introduce DeepD2V, a deep learning driven approach for highly accurate detection of DGA-generated domains, leveraging word embeddings learned from observed domain names in DNS queries or browsing URLs. Domain embeddings are constructed with Dom2Vec (D2V), a novel technique which builds on top of word embedding models (e.g., Word2Vec) to map words and tokens extracted from domain names into highly expressive representations. DeepD2V integrates a deep Convolutional Neural Network (CNN) architecture to make the most out of the D2V embeddings, realizing unprecedented detection performance for low false-alarm rates. Through experimental evaluation on a large-scale dataset (almost 400,000 domains) comprising 25 distinct families of DGA domains, we demonstrate the superiority of D2V embeddings as compared to standard, n-gram based like features commonly used in the literature for DGA detection. We show that DeepD2V significantly outperforms current state-of-the-art approaches for DGA detection and analysis based on shallow learning and lexicographic analysis, realizing precision and recall performance above 97%.
OriginalspracheEnglisch
Titel2024 IEEE International Conference on Machine Learning for Communication and Networking
UntertitelICMLCN
Seiten164-170
Seitenumfang7
ISBN (elektronisch)979-8-3503-4319-9
DOIs
PublikationsstatusVeröffentlicht - 15 Aug. 2024
Veranstaltung2024 IEEE International Conference on Machine Learning for Communication and Networking - Stockholm, Stockholm, Schweden
Dauer: 5 Mai 20248 Mai 2024

Konferenz

Konferenz2024 IEEE International Conference on Machine Learning for Communication and Networking
KurztitelICMLCN 2024
Land/GebietSchweden
StadtStockholm
Zeitraum5/05/248/05/24

Research Field

  • Multimodal Analytics

Fingerprint

Untersuchen Sie die Forschungsthemen von „DeepD2V - Deep Learning and Domain Word Embeddings for DGA based Malware Detection“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren