Generation and Evaluation of Lexicons Focused on HateSpeech using Deep Neural Networks

David Algarra Medina. (2021). Generation and Evaluation of Lexicons Focused on HateSpeech using Deep Neural Networks. Final Career Project (TFM). Universidad Politécnica de Madrid, ETSI Telecomunicación.

The rise in the use of social media and the consumption of online content in recent years has meant a change in the way people use the Internet and get information. The amount of information generated every day is of unprecedented magnitude, with all kinds of messages flowing from all points of view. Among this information, the freedom with which hate messages are spread without any kind of filter has been a concern for some years now,causing situations where groups or minorities are discriminated against, and may cause physical and mental damage to the people affected. Freedom of expression allows everyone to express their opinions freely, but this right comes with responsibilities and can be restricted if the message interferes with the human rights of other persons. Messages supporting racism, discriminating against women or members of the LGBT community or propagating messages of Nazi ideology are not allowed by local authorities or social media platforms. This work is focused on generating a lexicon where each word contains how related or not it is to hate speech. For this, Natural Language Processing and sentiment analysis techniques are used. First, a study is carried out to analyze word sentiments, whether they are positive or negative, from a collection of texts. And second, apply the techniques of the previous problem to a problem to generate a lexicon containing the hate language that the words have. Finally, we evaluate such lexicons with real text and check whether it is able to identify sentiment or hate language in sentences or paragraphs. This demonstrates the scope and potential impact that these tools can have in the future.