Últimas noticias

Este martes 24 de junio se presentan los resultados del proyecto AMOR en Aranjuez en el curso de verano de la URJC organizado por CETINIA titulado "Human-centred Artificial Intelligence: How to bypass the Turing Tra ...

Hoy 12/12/2024 se presenta el proyecto AMOR en el UNICO I+D Project Meet-up Madrid organizado por el proyecto ELADAIS con la participación de los proyectos UNICO CLOUD financiados en la UPM (ELADAIS, MAP 6G, RISC ...

The article "To Click It or Not to Click It: An Italian Dataset for Neutralising Clickbait Headlines" has been presented at the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024). The publication i ...

Canal GSI

The paper GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual detection of Hate speech against Immigrants and Women on Twitter, by Diego Benito, Óscar Araque, and Carlos A. Iglesias has been published at the Thirteenth International Workshop on Semantic Evaluation (SemEval-2019).

The SemEval workshop focuses on the evaluation and comparison of systems that can alyse diverse semantic phenomena in text with the aim of extending the current state of the art in semantic analysis and creating high quality annotated datasets in a range of increasingly challenging problems in natural language semantics. In particular, SemEval-2019 task 5 aims at detecting hate speech featured by two specific different targets, immigrants and women, in a multilingual perspective, for Spanish and English.

The publication represents the first major achievement of the Intelligent Systems Group in the field of hate speech, reflected in an honorable fifth position in the Spanish sub-task A and in the development of the best European system in the same sub-task.

Abstract. This paper describes the GSI-UPM system for SemEval-2019 Task 5, which tackles multilingual detection of hate speech on Twitter. The main contribution of the paper is the use of a method based on word embeddings and semantic similarity combined with traditional paradigms, such as n-grams, TF-IDF and POS. This combination of several features is fine-tuned through ablation tests, demonstrating the usefulness of different features. While our approach outperforms baseline classifiers on different sub-tasks, the best of our submitted runs reached the 5th position on the Spanish sub-task A.

The SemEval-2019 workshop was held June 6-7, 2019 in Minneapolis, USA, collocated with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019).