Audio-visual emotion recognition system for variable length spatio-temporal samples using deep transfer-learning

N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic Emotion recognition is renowned for being a difficult task, even for human intelligence. Due to the importance of having enough data in classification problems, we introduce a framework developed with the purpose of generating labeled audio to create our own database. In this paper we present a new model for audio-video emotion recognition using Transfer Learning (TL). The idea is to combine a pre-trained high level feature extractor Convolutional Neural Network (CNN) and a Bidirectional Recurrent Neural Network (BRNN) model to address the issue of variable sequence length inputs. Throughout the design process we discuss the main problems related to the high complexity of the task due to its inherent subjective nature and, on the other hand, the important results obtained by testing the model on different databases, outperforming the state-of-the-art algorithms in the SAVEE[3] database. Furthermore, we use the mentioned application to perform precision classification (per user) into low resources real scenarios with promising results.

Cite

CITATION STYLE

APA

Cano Montes, A., & Hernández Gómez, L. A. (2020). Audio-visual emotion recognition system for variable length spatio-temporal samples using deep transfer-learning. In Lecture Notes in Business Information Processing (Vol. 389 LNBIP, pp. 434–446). Springer. https://doi.org/10.1007/978-3-030-53337-3_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free