Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder

Ho Min Park; Ilho Yun; Ajit Kumar; Ankit Kumar Singh; Bong Jun Choi; Dhananjay Singh; Wesley De Neve

Conference ProceedingsOPEN ACCESS

Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder

MuSe 2022 - Proceedings of the 3rd International Multimodal Sentiment Analysis Workshop and Challenge (2022) 47-54

DOI: 10.1145/3551876.3554807

6Citations

9Readers

Abstract

MuSe-Stress 2022 aims at building sequence regression models for predicting valence and physiological arousal levels of persons who are facing stressful conditions. To that end, audio-visual recordings, transcripts, and physiological signals can be leveraged. In this paper, we describe the approach we developed for Muse-Stress 2022. Specifically, we engineered a new pose feature that captures the movement of human body keypoints. We also trained a Long Short-Term Memory (LSTM) network and a Transformer encoder on different types of feature sequences and different combinations thereof. In addition, we adopted a two-pronged strategy to tune the hyperparameters that govern the different ways the available features can be used. Finally, we made use of late fusion to combine the predictions obtained for the different unimodal features. Our experimental results show that the newly engineered pose feature obtains the second highest development CCC among the seven unimodal features available. Furthermore, our Transformer encoder obtains the highest development CCC for five out of fourteen possible combinations of features and emotion dimensions, with this number increasing from five to nine when performing late fusion. In addition, when searching for optimal hyperparameter settings, our two-pronged hyperparameter tuning strategy leads to noticeable improvements in maximum development CCC, especially when the underlying models are based on an LSTM. In summary, we can conclude that our approach is able to achieve a test CCC of 0.6196 and 0.6351 for arousal and valence, respectively, securing a Top-3 rank in Muse-Stress 2022.

Author supplied keywords

Cite

CITATION STYLE

APA

Park, H. M., Yun, I., Kumar, A., Singh, A. K., Choi, B. J., Singh, D., & De Neve, W. (2022). Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder. In MuSe 2022 - Proceedings of the 3rd International Multimodal Sentiment Analysis Workshop and Challenge (pp. 47–54). Association for Computing Machinery, Inc. https://doi.org/10.1145/3551876.3554807

Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder

Abstract

Author supplied keywords

Cite

Register to see more suggestions