AIT_FHSTP at EXIST 2023 Benchmark: Sexism detection by transfer learning, sentiment and toxicity embeddings and hand-crafted features

Jaqueline Böck (Speaker, Invited), Mina Schütz (Speaker, Invited), Daria Liakhovets, Nathanya Queby Satriani, Andreas Babic , Djordje Slijepcevic, Matthias Zeppelzauer, Alexander Schindler

Research output: Chapter in Book or Conference ProceedingsConference Proceedings with Oral Presentationpeer-review

Abstract

Sexism has become a widespread problem on social media and in online conversations. Therefore, the sEXism Identification in Social neTworks (EXIST) challenge addresses this issue at CLEF in 2023. In this year's version of this international benchmark, the goal is to automatically identify sexism in texts with the help of Natural Language Processing (NLP). The tasks are to determine whether a text is sexist, what the source intention behind it is and which type of sexist category it belongs to. This paper presents the contribution of our team, AIT\_FHSTP, in the EXIST challenge held at CLEF in 2023. We present three approaches to solve the classification tasks of this year's shared task. The baseline for all three approaches is an XLM-RoBERTa model pre-trained with additional datasets and fine-tuned on the EXIST2023 data. For our second and third approach we extracted the fine-tuned embeddings of the model and concatenated them with additional features. On the one hand we added sentiment and toxicity model embeddings and on the other hand we added multiple hand-crafted features and reduced the dimensionality with PCA. Afterwards we used these embeddings as an input for a Random Forest classifier who generated the final predictions. Our approach combining XLM-RoBERTa embeddings with additional crafted features and PCA achieved the 1st rank on the soft-soft evaluation of task 2 (source intention) with Spanish content and the 2nd rank for English content. For task 3 (sexism multilabel categorization), we achieved the 3rd rank in the hard-hard evaluation.
Original languageEnglish
Title of host publicationWorking Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)
EditorsMohammed Aliannejadi, Guglielmo Faggioli, Nicola Ferro, Michalis Vlachos
Place of PublicationThessaloniki, Greece
PublisherCEUR-WS
Pages 878-890
Number of pages13
Volume3497
ISBN (Electronic)ISSN 1613-0073
Publication statusPublished - 5 Oct 2023
Event CLEF 2023 Conference and Labs of the Evaluation Forum: Information Access Evaluation meets Multilinguality, Multimodality, and Visualization - Thessaloniki, Thessaloniki, Greece
Duration: 18 Sept 202321 Sept 2023

Conference

Conference CLEF 2023 Conference and Labs of the Evaluation Forum
Abbreviated titleCLEF 2023
Country/TerritoryGreece
CityThessaloniki
Period18/09/2321/09/23

Research Field

  • Former Research Field - Data Science

Keywords

  • Sexism detection
  • Sexism identification
  • Social Media Retrieval
  • Transformer Models
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'AIT_FHSTP at EXIST 2023 Benchmark: Sexism detection by transfer learning, sentiment and toxicity embeddings and hand-crafted features'. Together they form a unique fingerprint.

Cite this