Emotion recognition in videos via fusing multimodal features

Shizhe Chen; Yujie Dian; Xinrui Li; Xiaozhu Lin; Qin Jin; Haibo Liu; Li Lu

Conference Proceedings

Emotion recognition in videos via fusing multimodal features

Communications in Computer and Information Science (2016) 663 632-644

DOI: 10.1007/978-981-10-3005-5_52

4Citations

15Readers

Get full text

Abstract

Emotion recognition is a challenging task with a wide range of applications. In this paper, we present our system in the CCPR 2016 multimodal emotion recognition challenge. Multimodal features from acoustic signals, facial expressions and speech contents are extracted to recognize the emotion of the character in the video. Among them the facial CNN feature is the most discriminative feature for emotion recognition. We train SVM and random forest classifiers based on each type of features and utilize early and late fusion to combine the different modality features. To deal with the data unbalance issue, we propose to adapt the probability thresholds for each emotion class. The macro precision of our best multimodal fusion system achieves 50.34% on the testing set, which significantly outperforms the baseline of 30.63 %.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, S., Dian, Y., Li, X., Lin, X., Jin, Q., Liu, H., & Lu, L. (2016). Emotion recognition in videos via fusing multimodal features. In Communications in Computer and Information Science (Vol. 663, pp. 632–644). Springer Verlag. https://doi.org/10.1007/978-981-10-3005-5_52

Emotion recognition in videos via fusing multimodal features

Abstract

Author supplied keywords

Cite

Register to see more suggestions