Exploring Language Similarity: A Comparative Analysis of European Languages using the Needleman-Wunsch Algorithm

Antonia Langer

Publikation: AbschlussarbeitMasterarbeit

Abstract

This master thesis aims to explore the degree of similarity between 22 European languages by employing the Needleman-Wunsch algorithm for language comparison. The comparison is done on written text, which is extracted from the website of the European Union using a custom-built webscraper. Besides the language comparison, another part of this thesis is to perform an analysis on how different text sizes and applied preprocessing steps impact the comparison. Therefore texts in three different sizes are extracted for each language. Each text size is subjected to different combinations of four preprocessing techniques, resulting in a comprehensive analysis of the effect of these factors on language similarity measurements. The preprocessing steps include lemmatization, the removal of diacritics, the removal of whitespaces and the removal of punctuation. The results of the language comparison are represented visually as dendrograms and heatmaps, where the relationships and similarities between European languages are shown. These visualizations provide valuable insights into the linguistic connections and similarities among the languages under investigation.
OriginalspracheEnglisch
QualifikationMaster of Science
Gradverleihende Hochschule
  • University of Applied Sciences Technikum Wien
Betreuer/-in / Berater/-in
  • Schütz, Mina, Betreuer:in
  • Knapp, Bernhard , Betreuer:in, Externe Person
Datum der Bewilligung9 Okt. 2023
PublikationsstatusVeröffentlicht - Okt. 2023

Research Field

  • Ehemaliges Research Field - Data Science

Fingerprint

Untersuchen Sie die Forschungsthemen von „Exploring Language Similarity: A Comparative Analysis of European Languages using the Needleman-Wunsch Algorithm“. Zusammen bilden sie einen einzigartigen Fingerprint.

Diese Publikation zitieren