Development and evaluation of a Natural Language Processing system for radical propaganda detection using the Moral Foundations Theory and Affect Analysis

Alonso del Real, P. (2023). Development and evaluation of a Natural Language Processing system for radical propaganda detection using the Moral Foundations Theory and Affect Analysis. Final Career Project (TFG). Universidad Politécnica de Madrid, ETSI Telecomunicación.

Abstract:
Over the past few years, the world has experienced a significant polarization of opinions and ideologies, largely due to the vast amount of information available on the Internet. This tool has paved the way for organized terrorist groups to have more resources and opportunities to spread their extremist and harmful discourse to society as a whole. This work continues the research done on detecting radical propaganda in texts using Natural Language Processing (NLP), as the analysis of social cues can help to examine, identify and predict extremist users. The objective is to see which approaches to text feature selection help in identifying these cues and thus to classify the information obtained into two categories: radical or non-radical. The effectiveness of each evaluated model is quantitatively measured and, in some cases, graphically explained thanks to techniques to explain the predictions of machine learning models such as SHapley Additive exPlanations (SHAP). More specifically, an evaluation of the performance of an approach based on moral foundations in combination with affective cues and a technique based on semantic similarity is carried out. This particular fusion of feature extraction methods has not been done before, so the results will provide useful information for the field of radical propaganda detection in written language. The result of this work shows how morality is a concept that can effectively help in the proposed task by developing lexicons based on the Moral Foundations Theory (MFT). For this purpose, two different vocabularies are used, both created to provide extra information and added value to the machine learning model being developed. The results obtained are compared and analyzed to see which is more useful in which situation.