Audio-Visual Database for Spanish-Based Speech Recognition Systems

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic speech recognition involves an understanding of what is being said. It can be audio-based, visual-based, or audio/visual-based according to the type of inputs. Modern speech recognition systems are based on machine learning techniques, such as deep learning. Deep learning systems improve their performance when more data are used to train them. Therefore, data has become one of the most valuable assets in the field of Artificial Intelligence. In this work, we present a methodology to create a database for audio/visual speech recognition. Due to the lack of Spanish datasets, we created a comprehensive Spanish-based speech recognition dataset. For this, we selected hundreds of YouTube videos, found the facial features, and aligned the voice beside text with millisecond accuracy using IBM speech-to-text technology. We split the data into three speaker face angles, where the frontal angle represents the simple case, and right-left angles represent harder cases. As a result, we obtained a dataset of more than 100 thousand samples consisting of a small video with its respective annotation. Our approach can be used to generate datasets on any language by merely selecting videos in the desired language. The database and the source code to create it are open-source.

Cite

CITATION STYLE

APA

Córdova-Esparza, D. M., Terven, J., Romero, A., & Herrera-Navarro, A. M. (2019). Audio-Visual Database for Spanish-Based Speech Recognition Systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11835 LNAI, pp. 452–460). Springer. https://doi.org/10.1007/978-3-030-33749-0_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free