Evaluación experimental de tecnologías semánticas aplicadas al análisis de conversaciones telefónicas para entornos VoC (Voz de Cliente)

Campos de la Mata, J. (2015). Evaluación experimental de tecnologías semánticas aplicadas al análisis de conversaciones telefónicas para entornos VoC (Voz de Cliente). Final Career Project (TFG). ETSI Telecomunicación, Universidad Politécnica de Madrid.

Abstract:
This project aims to analyse the development and experiments conducted to study the combination of speech recognition, text classification and entities and concept extraction technologies to obtain an automatic interpretation of telephone conversation contents. This evaluation is interesting for Voice of Costumer (VoC) analysis applications. For this purpose, Speech Analysis for VoC, a REST API service, has been developed. This API is able to extract different types of semantic information from audio files which contain telephone conversations. The types of information are: transcriptions, topics covered in the conversation and their relevance, or entities and concepts. This service could be very useful to analyses opinions, customer feedback or complaints related to an enterprise from telephone conversation recordings. Speech Analysis for VoC relies on the leading edge speech processing technology of “VoxSigma® Speech-to-Text Software Suite” offered by Vocapia Research. This API provides a list of segments that forms the audio transcription. In addition, for the purpose of extracting semantic meaning from the conversation, Speech Analysis for VoC uses “Text classification” and “Sentiment Analysis” APIs. Both APIs will be available in the MeaningCloud.com platform. The valuable information that our RESTful service provides will be shown through a semantic tagging in order to obtain data results in an easy and automatic way. Quality evaluation of ASR (Automatic Speech recognition) is another purpose of this project. Currently, this process is not without problems due to different accents in the same language or punctuation problems. Therefore, an analysis of output accuracy from audio input was needed. We not only analyse the degree of similarity between the hypothesis and reference transcription, but also, we set up a number of measures to compare classes, entities and concept obtained from Speech Analysis for VoC with reference items. For the evaluation task we worked with the Fisher Spanish corpus, from Linguistic Data Consortium (LDC). This corpus consists of 100 conversations of about 10-12 minutes between Spanish speakers. Processing, analysis and testing these auditions allowed us to create an evaluation batch. In this way, we could obtain reliable results about accuracy and viability of Speech Analysis for VoC to use it in practical applications in the future.