Design and Development of a Machine Learning System for Opinion and Natural Language Analysis in Social Media. Application to the Ride-Hailing and Radicalization Domains

de Pablo Marsal, Á. (2021). Design and Development of a Machine Learning System for Opinion and Natural Language Analysis in Social Media. Application to the Ride-Hailing and Radicalization Domains. Final Career Project.

Abstract:
The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, allows to analyze the behavior of users and their opinions with respect to different domains. The application of Artificial Intelligence techniques and algorithms, specifically from the branch of Natural Language Processing (NLP), has made progress in this regard, in such a way that it is possible to develop models that predict the subject matter of posts or the way in which a certain topic is discussed. This is essential to understand the opinion of users on a particular topic, to know the degree of satisfaction of the customers of a service or even to identify hate speeches or messages with a clear extremist content. In this project, it has been developed a system that analyzes automatically and in real time the content of posts written in different social media to analyze the language used and the opinion of these users on different topics, specifically related to mobility platforms (Ride-Hailing) or topics of extremist and radical character. The developed system is based on several NLP and Machine Learning techniques, such as Topic Modeling, Sentiment Analysis, or the creation of classification models, among others. The results of this analysis can be observed thanks to the implementation of a visualization module, where the results are shown in an aggregated way and can be filtered to perform a customized analysis. Finally, this project tries to study the feasibility of using the developed classification models with other types of data with which the models have not been trained. This would allow the development of applications whose use could be extended to other media, facili- tating the reuse of models that have been generated based on limited data sets.