Sexism has become a widespread problem on social media and in online conversations. Therefore, the sEXism Identification in Social neTworks (EXIST) challenge addresses this issue at CLEF in 2023. In this year's version of this international benchmark, the goal is to automatically identify sexism in texts with the help of Natural Language Processing (NLP). The tasks are to determine whether a text is sexist, what the source intention behind it is and which type of sexist category it belongs to. This paper presents the contribution of our team, AIT\_FHSTP, in the EXIST challenge held at CLEF in 2023. We present three approaches to solve the classification tasks of this year's shared task. The baseline for all three approaches is an XLM-RoBERTa model pre-trained with additional datasets and fine-tuned on the EXIST2023 data. For our second and third approach we extracted the fine-tuned embeddings of the model and concatenated them with additional features. On the one hand we added sentiment and toxicity model embeddings and on the other hand we added multiple hand-crafted features and reduced the dimensionality with PCA. Afterwards we used these embeddings as an input for a Random Forest classifier who generated the final predictions. Our approach combining XLM-RoBERTa embeddings with additional crafted features and PCA achieved the 1st rank on the soft-soft evaluation of task 2 (source intention) with Spanish content and the 2nd rank for English content. For task 3 (sexism multilabel categorization), we achieved the 3rd rank in the hard-hard evaluation.
| CLEF 2023 Conference and Labs of the Evaluation Forum
|18/09/23 → 21/09/23
- Ehemaliges Research Field - Data Science