Publicación - Study and Detection of Anti-Vaccine Language on Social Media using Natural Language Processing Techniques

Study and Detection of Anti-Vaccine Language on Social Media using Natural Language Processing Techniques

Mª Felipa Ledesma Corniel. (2023). Study and Detection of Anti-Vaccine Language on Social Media using Natural Language Processing Techniques. Trabajo Fin de Titulación (TFG). Universidad Politécnica de Madrid, ETSI Telecomunicación.

Abstract:

This project presents an analysis of anti-vaccine discourse on the social media platform Twitter using Natural Language Processing (NLP) techniques. Vaccines are biomedical technologies that stimulate the immune system to generate a protective response against specific diseases. However, despite their well-established efficacy throughout history, they generate distrust and fear in a portion of the populations. To investigate this movement, we collect anti-vaccine messages from Twitter. Social media platforms provide a valuable source for conducting Natural Language Processing (NLP) analysis due to the abundance of data and diverse perspectives shared in real time. An exploratory investigation is carried out to identify and understand how these communities communicate on Twitter. Subsequently, a comprehensive dataset comprising tweets from recent years is collected and carefully curated for further analysis. Firstly, text preprocessing is performed to ensure the quality and integrity of the dataset. Next, advanced NLP techniques are employed to uncover linguistic patterns and structures within the collected tweets. These techniques include n-gram extraction and temporal vocabulary analysis to identify changes in language use over time. Furthermore, a comparative analysis is conducted to obtain differences in perspectives and arguments between anti-vaccine and pro-vaccine individuals. In order to delve deeper, a pre-trained language model is utilized to perform sentiment and irony analysis, aiming to detect variations in the emotional tone associated with different terms. Lastly, state-of-the-art classifier algorithms are employed to draw conclusions regarding the disparities between categories. This research seeks to uncover underlying patterns, prevalent themes, and persuasive strategies that contribute to the dissemination of the anti-vaccine movement. The findings of this study are expected to serve as a foundation for further research, policy discussions, and targeted interventions aimed at addressing and combating this issue

Bibtex:

@mastersthesis{ledesma2023tfg,
author = "Corniel, Mª Felipa Ledesma",
abstract = "This project presents an analysis of anti-vaccine discourse on the social media platform Twitter using Natural Language Processing (NLP) techniques. Vaccines are biomedical technologies that stimulate the immune system to generate a protective response against specific diseases. However, despite their well-established efficacy throughout history, they generate distrust and fear in a portion of the populations.

To investigate this movement, we collect anti-vaccine messages from Twitter. Social media platforms provide a valuable source for conducting Natural Language Processing (NLP) analysis due to the abundance of data and diverse perspectives shared in real time. An exploratory investigation is carried out to identify and understand how these communities communicate on Twitter. Subsequently, a comprehensive dataset comprising tweets from recent years is collected and carefully curated for further analysis.

Firstly, text preprocessing is performed to ensure the quality and integrity of the dataset. Next, advanced NLP techniques are employed to uncover linguistic patterns and structures within the collected tweets. These techniques include n-gram extraction and temporal vocabulary analysis to identify changes in language use over time. Furthermore, a comparative analysis is conducted to obtain differences in perspectives and arguments between anti-vaccine and pro-vaccine individuals. In order to delve deeper, a pre-trained language model is utilized to perform sentiment and irony analysis, aiming to detect variations in the emotional tone associated with different terms. Lastly, state-of-the-art classifier algorithms are employed to draw conclusions regarding the disparities between categories.

This research seeks to uncover underlying patterns, prevalent themes, and persuasive strategies that contribute to the dissemination of the anti-vaccine movement. The findings of this study are expected to serve as a foundation for further research, policy discussions, and targeted interventions aimed at addressing and combating this issue
",
address = "ETSI Telecomunicaci{\'o}n",
institution = "Universidad Polit{\'e}cnica de Madrid",
keywords = "natural language processing;machine learning;Sentiment analysis;transformers;anti-vaccines",
month = "June",
title = "{S}tudy and {D}etection of {A}nti-{V}accine {L}anguage on {S}ocial {M}edia using {N}atural {L}anguage {P}rocessing {T}echniques",
type = "TFG",
year = "2023",
}