Phish Me If You Can - Lexicographic Analysis and Machine Learning for Phishing Websites Detection with PHISHWEB

Lucas Torrealba (Author and Speaker), Pedro Casas-Hernandez, Javier Bustos-Jiménez, Germán Capdehourat, Mislav Findrik

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

We introduce PHISHWEB, a novel approach to website phishing detection, which detects and categorizes malicious websites through a progressive, multi-layered analysis. PHISHWEB’s detection includes forged domains such as homoglyph and typosquatting, as well as automatically generated domains through DGA technology. The focus of PHISHWEB is on lexicographic-based analysis of the domain name itself, improving applicability and scalability of the approach. Preliminary results on the application of PHISHWEB to multiple open domain-name datasets show precision and recall results above 90%. We additionally extend PHISHWEB’s detection of DGA domains through Machine Learning (ML), using a small set of highly specialized lexicographic domain features. Results on the detection of DGA domains show that, for a false alarm rate below 1%, the ML-extension of PHISHWEB improves non-ML PHISHWEB DGA detector as well as state-of-the-art by at least 60%, realizing precision and recall values of 93.1% and 84.8%, respectively. Finally, we also present preliminary results on the application of PHISHWEB to real, in the wild DNS requests collected at large mobile and fixed-line operational networks, discussing some of the findings.
Original languageEnglish
Title of host publicationIEEE 9th International Conference on Network Softwarization (NetSoft)
Place of Publication2023
Pages252
Number of pages256
ISBN (Electronic)979-8-3503-9980-6
DOIs
Publication statusPublished - 13 Jul 2023
Event9th IEEE International Conference on Network Softwarization, NetSoft 2023 -
Duration: 19 Jun 202323 Jun 2023

Conference

Conference9th IEEE International Conference on Network Softwarization, NetSoft 2023
Period19/06/2323/06/23

Research Field

  • Former Research Field - Data Science

Keywords

  • Phishing Websites
  • DNS
  • Lexicographic Analysis
  • Machine Learning

Fingerprint

Dive into the research topics of 'Phish Me If You Can - Lexicographic Analysis and Machine Learning for Phishing Websites Detection with PHISHWEB'. Together they form a unique fingerprint.

Cite this