Abstract:
This project aims to analyse the development and experiments conducted to study the
combination of speech recognition, text classification and entities and concept extraction
technologies to obtain an automatic interpretation of telephone conversation contents.
This evaluation is interesting for Voice of Costumer (VoC) analysis applications.
For this purpose, Speech Analysis for VoC, a REST API service, has been developed. This
API is able to extract different types of semantic information from audio files which
contain telephone conversations. The types of information are: transcriptions, topics
covered in the conversation and their relevance, or entities and concepts. This service
could be very useful to analyses opinions, customer feedback or complaints related to an
enterprise from telephone conversation recordings.
Speech Analysis for VoC relies on the leading edge speech processing technology of
“VoxSigma® Speech-to-Text Software Suite” offered by Vocapia Research. This API
provides a list of segments that forms the audio transcription. In addition, for the purpose
of extracting semantic meaning from the conversation, Speech Analysis for VoC uses “Text
classification” and “Sentiment Analysis” APIs. Both APIs will be available in the
MeaningCloud.com platform. The valuable information that our RESTful service provides
will be shown through a semantic tagging in order to obtain data results in an easy and
automatic way.
Quality evaluation of ASR (Automatic Speech recognition) is another purpose of this
project. Currently, this process is not without problems due to different accents in the
same language or punctuation problems. Therefore, an analysis of output accuracy from
audio input was needed. We not only analyse the degree of similarity between the
hypothesis and reference transcription, but also, we set up a number of measures to
compare classes, entities and concept obtained from Speech Analysis for VoC with
reference items.
For the evaluation task we worked with the Fisher Spanish corpus, from Linguistic Data
Consortium (LDC). This corpus consists of 100 conversations of about 10-12 minutes
between Spanish speakers. Processing, analysis and testing these auditions allowed us to
create an evaluation batch. In this way, we could obtain reliable results about accuracy
and viability of Speech Analysis for VoC to use it in practical applications in the future.