The paper GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual detection of Hate speech against Immigrants and Women on Twitter, by Diego Benito, Óscar Araque, and Carlos A. Iglesias has been published at the Thirteenth International Workshop on Semantic Evaluation (SemEval-2019).
The SemEval workshop focuses on the evaluation and comparison of systems that can alyse diverse semantic phenomena in text with the aim of extending the current state of the art in semantic analysis and creating high quality annotated datasets in a range of increasingly challenging problems in natural language semantics. In particular, SemEval-2019 task 5 aims at detecting hate speech featured by two specific different targets, immigrants and women, in a multilingual perspective, for Spanish and English.
The publication represents the first major achievement of the Intelligent Systems Group in the field of hate speech, reflected in an honorable fifth position in the Spanish sub-task A and in the development of the best European system in the same sub-task.
Abstract. This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.
The SemEval-2019 workshop was held June 6-7, 2019 in Minneapolis, USA, collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).