Abstract
Sexism has become a widespread problem on social media and in online conversations. Therefore, the sEXism Identification in Social neTworks (EXIST) challenge addresses this issue at CLEF in 2023. In this year's version of this international benchmark, the goal is to automatically identify sexism in texts with the help of Natural Language Processing (NLP). The tasks are to determine whether a text is sexist, what the source intention behind it is and which type of sexist category it belongs to. This paper presents the contribution of our team, AIT\_FHSTP, in the EXIST challenge held at CLEF in 2023. We present three approaches to solve the classification tasks of this year's shared task. The baseline for all three approaches is an XLM-RoBERTa model pre-trained with additional datasets and fine-tuned on the EXIST2023 data. For our second and third approach we extracted the fine-tuned embeddings of the model and concatenated them with additional features. On the one hand we added sentiment and toxicity model embeddings and on the other hand we added multiple hand-crafted features and reduced the dimensionality with PCA. Afterwards we used these embeddings as an input for a Random Forest classifier who generated the final predictions. Our approach combining XLM-RoBERTa embeddings with additional crafted features and PCA achieved the 1st rank on the soft-soft evaluation of task 2 (source intention) with Spanish content and the 2nd rank for English content. For task 3 (sexism multilabel categorization), we achieved the 3rd rank in the hard-hard evaluation.
Original language | English |
---|---|
Title of host publication | Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023) |
Editors | Mohammed Aliannejadi, Guglielmo Faggioli, Nicola Ferro, Michalis Vlachos |
Place of Publication | Thessaloniki, Greece |
Publisher | CEUR-WS |
Pages | 878-890 |
Number of pages | 13 |
Volume | 3497 |
ISBN (Electronic) | ISSN 1613-0073 |
Publication status | Published - 5 Oct 2023 |
Event | CLEF 2023 Conference and Labs of the Evaluation Forum: Information Access Evaluation meets Multilinguality, Multimodality, and Visualization - Thessaloniki, Thessaloniki, Greece Duration: 18 Sept 2023 → 21 Sept 2023 |
Conference
Conference | CLEF 2023 Conference and Labs of the Evaluation Forum |
---|---|
Abbreviated title | CLEF 2023 |
Country/Territory | Greece |
City | Thessaloniki |
Period | 18/09/23 → 21/09/23 |
Research Field
- Former Research Field - Data Science
Keywords
- Sexism detection
- Sexism identification
- Social Media Retrieval
- Transformer Models
- Natural Language Processing