Recently, during the last few years, activity over Internet and social network connectivity has been increased. Social networks are platforms that ease communication between users by means of different interactions. Unfortunately, social networks have also become places for hate speech proliferation.
Hate Speech has become a popular topic in recent years. This is reflected not only by the increased media coverage of this problem but also by the growing political attention it is receiving. Given the constant progression of this phenomenon, institutions, international minorities associations, researchers and social networks are trying to react as quickly as possible. Because of the massive scale of the social networks, methods that automatically detect hate speech are required. Natural Language Processing (NLP) focusing specifically
on this phenomenon is required since basic word filters do not provide a sufficient remedy: a hate speech utterance might be influenced by aspects such us the domain, context, co-occurrence media objects (images, video, audio), etc.
This thesis is the result of a project whose main aim has been to obtain a hate speech detector with a multilingual perspective, in order to remove all shape of hate speech that can occur in social networks, independently the origin language. During the development phase, there have been used supervised machine learning tools, NLP techniques, and Python as programming language.
The proposed system is evaluated against two study cases, a participation in a internationally recognized competition, such as SemEval and facing the system against a Transfer Learning challenge across languages and hate speech traits. The extensive experimentation carried out has resulted in a very honorable position in the SemEval competition and in a demonstration of the benefits that can be brought by the appliance of Transfer Learning to the hate speech detection problem.