Emotion recognition in videos via fusing multimodal features

4Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Emotion recognition is a challenging task with a wide range of applications. In this paper, we present our system in the CCPR 2016 multimodal emotion recognition challenge. Multimodal features from acoustic signals, facial expressions and speech contents are extracted to recognize the emotion of the character in the video. Among them the facial CNN feature is the most discriminative feature for emotion recognition. We train SVM and random forest classifiers based on each type of features and utilize early and late fusion to combine the different modality features. To deal with the data unbalance issue, we propose to adapt the probability thresholds for each emotion class. The macro precision of our best multimodal fusion system achieves 50.34% on the testing set, which significantly outperforms the baseline of 30.63 %.

Cite

CITATION STYLE

APA

Chen, S., Dian, Y., Li, X., Lin, X., Jin, Q., Liu, H., & Lu, L. (2016). Emotion recognition in videos via fusing multimodal features. In Communications in Computer and Information Science (Vol. 663, pp. 632–644). Springer Verlag. https://doi.org/10.1007/978-981-10-3005-5_52

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free