Dom2Vec - Detecting DGA Domains Through Word Embeddings and AI/ML-Driven Lexicographic Analysis

Lucas Torrealba (Author and Speaker), Pedro Casas-Hernandez, Javier Bustos-Jiménez, Germán Capdehourat, Mislav Findrik

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

The timely identification of DNS queries to Domain Generation Algorithm (DGA) domains plays a critical role in mitigating malware propagation and its potential impact, especially in thwarting coordinated botnet activity. We introduce Dom2Vec, an innovative approach for swiftly detecting DGA-generated domains by leveraging lexicographic features exclusively derived from the observed domain names in DNS queries. Dom2Vec leverages word embeddings to map tokens extracted from domain names into highly expressive representations. These representations are then combined with a reputation-based scoring system for domain names, which utilizes the co-occurrence frequency of n-grams in relation to a list of whitelisted domains. The fusion of domain embeddings, reputation scores, and other meaningful lexicographic features derived from domain names provides robust domain name representations for AI/ML-driven detection of DGAs. Through experimental evaluation on a dataset comprising 25 distinct families of DGA domains, we demonstrate that Dom2Vec significantly outperforms current state-of-the-art approaches for DGA detection and analysis, improving our previous detection system based on reputation scores by at least 30%, for a false-alarm rate below 1%.
Original languageEnglish
Title of host publication2023 19th International Conference on Network and Service Management (CNSM)
Pages1-5
Number of pages5
ISBN (Electronic)978-3-903176-59-1
DOIs
Publication statusPublished - 28 Nov 2023
Event2023 19th International Conference on Network and Service Management (CNSM) - Niagara Falls, ON, Ontario, Canada
Duration: 30 Oct 20232 Nov 2023

Conference

Conference2023 19th International Conference on Network and Service Management (CNSM)
Country/TerritoryCanada
CityOntario
Period30/10/232/11/23

Research Field

  • Former Research Field - Data Science

Fingerprint

Dive into the research topics of 'Dom2Vec - Detecting DGA Domains Through Word Embeddings and AI/ML-Driven Lexicographic Analysis'. Together they form a unique fingerprint.

Cite this