Publication - Design and Development of a Lexicon-based Emotion Classifier for the Sports Domain on Twitter

Design and Development of a Lexicon-based Emotion Classifier for the Sports Domain on Twitter

Luis García. (2019). Design and Development of a Lexicon-based Emotion Classifier for the Sports Domain on Twitter. Final Career Project. Universidad Politécnica de Madrid.

Abstract:

Objective: The use of social media has rapidly increased in the last few years. People tend to express their points of view, opinions, perceptions, feelings regarding a myriad of topics. This is especially obvious in the sports context on Twitter, where users put across their preferences, favorites for different categories, etc. This massive quantity of data can be exploited for analysis of events, and even for prediction of future events (e.g., stock market). In the next years, Twitter will be even more widely used, which will generate tons of data that could be very useful under proper analysis. Furthermore, the amount of Olympic events has also grown: the Summer Olympic Games -the traditional ones-, the Winter Olympic Games, the Youth Olympic Games, the Summer Paralympic Games, the Winter Paralympic Games... Also, this domain has not been properly studied and, as previously studied, its idiosyncrasy is relevant. The objective of this project is to improve the use of an existing lexicon to detect feelings and emotions. Under this objective, the improvement of the existing lexicon can be addressed, as well as the generation of alternative lexicons that better capture the emotional information. Methodology: We are going to build a machine learning system which will use a dataset comprised of tweets related to Olympic events and a lexicon with some emotions regarding some keywords. The evaluation of such system will be done attending to performance metrics (Accuracy, F1-Score) on real data. Technologies: The project will be based on Python, a general-purpose programming language that has some powerful and handy libraries for machine learning such as pandas, NumPy, SciPy, scikit-learn, etc. Resulting data will be published following the principles of Linked Open Data, through a system which will use the SPARQL, ElasticSearch and Fuseki technologies.

JRESEARCH_BIBTEX:

@mastersthesis{garcia2019sports,
author = "Garc{\'i}a, Luis",
abstract = "Objective:
The use of social media has rapidly increased in the last few years. People tend to express their points of view, opinions, perceptions, feelings regarding a myriad of topics. This is especially obvious in the sports context on Twitter, where users put across their preferences, favorites for different categories, etc. This massive quantity of data can be exploited for analysis of events, and even for prediction of future events (e.g., stock market). In the next years, Twitter will be even more widely used, which will generate tons of data that could be very useful under proper analysis. 
Furthermore, the amount of Olympic events has also grown: the Summer Olympic Games -the traditional ones-, the Winter Olympic Games, the Youth Olympic Games, the Summer Paralympic Games, the Winter Paralympic Games... Also, this domain has not been properly studied and, as previously studied, its idiosyncrasy is relevant.
The objective of this project is to improve the use of an existing lexicon to detect feelings and emotions. Under this objective, the improvement of the existing lexicon can be addressed, as well as the generation of alternative lexicons that better capture the emotional information.
Methodology:
We are going to build a machine learning system which will use a dataset comprised of tweets related to Olympic events and a lexicon with some emotions regarding some keywords. The evaluation of such system will be done attending to performance metrics (Accuracy, F1-Score) on real data.
Technologies: 
The project will be based on Python, a general-purpose programming language that has some powerful and handy libraries for machine learning such as pandas, NumPy, SciPy, scikit-learn, etc. Resulting data will be published following the principles of Linked Open Data, through a system which will use the SPARQL, ElasticSearch and Fuseki technologies.",
institution = "Universidad Polit{\'e}cnica de Madrid",
title = "{D}esign and {D}evelopment of a {L}exicon-based {E}motion {C}lassifier for the {S}ports {D}omain on {T}witter",
year = "2019",
}