Identification of multimodal signals for emotion recognition in the context of human-robot interaction

This paper presents a proposal for the identification of multimodal signals for recognizing 4 human emotions in the context of human-robot interaction, specifically, the following emotions: happiness, anger, surprise and neutrality. We propose to implement a multiclass classifier that is based on two unimodal classifiers: one to process the input data from a video signal and another one that uses audio. On one hand, for detecting the human emotions using video data we have propose a multiclass image classifier based on a convolutional neural network that achieved 86.4% of generalization accuracy for individual frames and 100% when used to detect emotions in a video stream. On the other hand, for the emotion detection using audio data we have proposed a multiclass classifier based on several one-class classifiers, one for each emotion, achieving a generalization accuracy of 69.7%. The complete system shows a generalization error of 0% and is tested with several real users in an sales-robot application.




Pérez, A. K., Quintero, C. A., Rodríguez, S., Rojas, E., Peña, O., & De La Rosa, F. (2018). Identification of multimodal signals for emotion recognition in the context of human-robot interaction. In Communications in Computer and Information Science (Vol. 820, pp. 67–80). Springer Verlag.

