Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition

Mo Sun; Jian Li; Hui Feng; Wei Gou; Haifeng Shen; Jian Tang; Yi Yang; Jieping Ye

Conference ProceedingsOPEN ACCESS

Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition

ICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction (2020) 835-840

DOI: 10.1145/3382507.3417971

14Citations

18Readers

Get full text

Abstract

This paper presents our approach for Audio-video Group Emotion Recognition sub-challenge in the EmotiW 2020. The task is to classify a video into one of the group emotions such as positive, neutral, and negative. Our approach exploits two different feature levels for this task, spatio-temporal feature and static feature level. In spatio-temporal feature level, we adopt multiple input modalities (RGB, RGB difference, optical flow, warped optical flow) into multiple video classification network to train the spatio-temporal model. In static feature level, we crop all faces and bodies in an image with the state-of the-art human pose estimation method and train kinds of CNNs with the image-level labels of group emotions. Finally, we fuse all 14 models result together, and achieve the third place in this sub-challenge with classification accuracies of 71.93% and 70.77% on the validation set and test set, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Sun, M., Li, J., Feng, H., Gou, W., Shen, H., Tang, J., … Ye, J. (2020). Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition. In ICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction (pp. 835–840). Association for Computing Machinery, Inc. https://doi.org/10.1145/3382507.3417971

Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions