Development of a Social Media Crawler for Sentiment Analysis

José Emilio Carmona. (2016). Development of a Social Media Crawler for Sentiment Analysis. Final Career Project. ETSI Telecomunicación, Universidad Politécnica de Madrid.

This thesis collects the result of a project whose objective is to design and develop the next elements:  A comments collection system for social networks and recommendations sites.  GSI Crawler, a website that, using the previous system, will collect and analyze the comments from the di erent websites.  Implementation of a service to schedule, monitor and administrate the crawling system. It will be described the development of scrapers to collect comments. A scraper has been developed for each website. Facebook, Twitter and YouTube o er the necessary information through the use of a speci c API. Otherwise, Amazon, Yelp and TripAdvisor don't o er an API which we could extract the comments, therefore a custom scraper has had to be developed to each one of these websites. Next, the development of GSI Crawler will be described. This website is useful to the analysis of comments from any website mentioned before. The user will choose the type of analysis he wants to carry out (Emotions, Sentiments or Fake Analysis) and the user will also supply, for instance, a direct URL to a Yelp's Business, the id of a Facebook's Fan Page or a YouTube's Video. GSI Crawler will download the comments belonging to this element and, later, the pertinent analysis will be run using the Senpy tool. Once the analysis is nished, a summary of the result will be shown and the possibility of review each comment one by one will be also o ered. Finally, we gather the extracted conclusions from this project, the technologies we have learned during the development and the possible lines of future work.