Publication - Generation and Evaluation of Lexicons Focused on HateSpeech using Deep Neural Networks

Generation and Evaluation of Lexicons Focused on HateSpeech using Deep Neural Networks

David Algarra Medina. (2021). Generation and Evaluation of Lexicons Focused on HateSpeech using Deep Neural Networks. Final Career Project (TFM). Universidad Politécnica de Madrid, ETSI Telecomunicación.

Abstract:

The rise in the use of social media and the consumption of online content in recent years has meant a change in the way people use the Internet and get information. The amount of information generated every day is of unprecedented magnitude, with all kinds of messages flowing from all points of view. Among this information, the freedom with which hate messages are spread without any kind of filter has been a concern for some years now,causing situations where groups or minorities are discriminated against, and may cause physical and mental damage to the people affected. Freedom of expression allows everyone to express their opinions freely, but this right comes with responsibilities and can be restricted if the message interferes with the human rights of other persons. Messages supporting racism, discriminating against women or members of the LGBT community or propagating messages of Nazi ideology are not allowed by local authorities or social media platforms. This work is focused on generating a lexicon where each word contains how related or not it is to hate speech. For this, Natural Language Processing and sentiment analysis techniques are used. First, a study is carried out to analyze word sentiments, whether they are positive or negative, from a collection of texts. And second, apply the techniques of the previous problem to a problem to generate a lexicon containing the hate language that the words have. Finally, we evaluate such lexicons with real text and check whether it is able to identify sentiment or hate language in sentences or paragraphs. This demonstrates the scope and potential impact that these tools can have in the future.

JRESEARCH_BIBTEX:

@mastersthesis{algarra2021master,
author = "Medina, David Algarra",
abstract = "The rise in the use of social media and the consumption of online content in recent years has meant a change in the way people use the Internet and get information.  The amount of information generated every day is of unprecedented magnitude, with all kinds of messages flowing  from  all  points  of  view.   Among  this  information,  the  freedom  with  which  hate messages  are  spread  without  any  kind  of  filter  has  been  a  concern  for  some  years  now,causing  situations  where  groups  or  minorities  are  discriminated  against,  and  may  cause physical and mental damage to the people affected.
Freedom  of  expression  allows  everyone  to  express  their  opinions  freely,  but  this  right comes with responsibilities and can be restricted if the message interferes with the human rights  of  other  persons.   Messages  supporting  racism,  discriminating  against  women  or members of the LGBT community or propagating messages of Nazi ideology are not allowed by local authorities or social media platforms.
This work is focused on generating a lexicon where each word contains how related or not  it  is  to  hate  speech.   For  this,  Natural  Language  Processing  and  sentiment  analysis techniques are used.  First, a study is carried out to analyze word sentiments, whether they are positive or negative, from a collection of texts.  And second, apply the techniques of the previous problem to a problem to generate a lexicon containing the hate language that the words have.
Finally, we evaluate such lexicons with real text and check whether it is able to identify sentiment or hate language in sentences or paragraphs.  This demonstrates the scope and potential impact that these tools can have in the future.",
address = "ETSI Telecomunicaci{\'o}n",
institution = "Universidad Polit{\'e}cnica de Madrid",
keywords = "sentiment analysis;hate speech;lexical resources",
month = "June",
title = "{G}eneration and {E}valuation of {L}exicons {F}ocused on {H}ate{S}peech using {D}eep {N}eural {N}etworks",
type = "TFM",
year = "2021",
}