Automatic Emotion recognition is renowned for being a difficult task, even for human intelligence. Due to the importance of having enough data in classification problems, we introduce a framework developed with the purpose of generating labeled audio to create our own database. In this paper we present a new model for audio-video emotion recognition using Transfer Learning (TL). The idea is to combine a pre-trained high level feature extractor Convolutional Neural Network (CNN) and a Bidirectional Recurrent Neural Network (BRNN) model to address the issue of variable sequence length inputs. Throughout the design process we discuss the main problems related to the high complexity of the task due to its inherent subjective nature and, on the other hand, the important results obtained by testing the model on different databases, outperforming the state-of-the-art algorithms in the SAVEE[3] database. Furthermore, we use the mentioned application to perform precision classification (per user) into low resources real scenarios with promising results.
CITATION STYLE
Cano Montes, A., & Hernández Gómez, L. A. (2020). Audio-visual emotion recognition system for variable length spatio-temporal samples using deep transfer-learning. In Lecture Notes in Business Information Processing (Vol. 389 LNBIP, pp. 434–446). Springer. https://doi.org/10.1007/978-3-030-53337-3_32
Mendeley helps you to discover research relevant for your work.