Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

MuSe-Stress 2022 aims at building sequence regression models for predicting valence and physiological arousal levels of persons who are facing stressful conditions. To that end, audio-visual recordings, transcripts, and physiological signals can be leveraged. In this paper, we describe the approach we developed for Muse-Stress 2022. Specifically, we engineered a new pose feature that captures the movement of human body keypoints. We also trained a Long Short-Term Memory (LSTM) network and a Transformer encoder on different types of feature sequences and different combinations thereof. In addition, we adopted a two-pronged strategy to tune the hyperparameters that govern the different ways the available features can be used. Finally, we made use of late fusion to combine the predictions obtained for the different unimodal features. Our experimental results show that the newly engineered pose feature obtains the second highest development CCC among the seven unimodal features available. Furthermore, our Transformer encoder obtains the highest development CCC for five out of fourteen possible combinations of features and emotion dimensions, with this number increasing from five to nine when performing late fusion. In addition, when searching for optimal hyperparameter settings, our two-pronged hyperparameter tuning strategy leads to noticeable improvements in maximum development CCC, especially when the underlying models are based on an LSTM. In summary, we can conclude that our approach is able to achieve a test CCC of 0.6196 and 0.6351 for arousal and valence, respectively, securing a Top-3 rank in Muse-Stress 2022.

Cite

CITATION STYLE

APA

Park, H. M., Yun, I., Kumar, A., Singh, A. K., Choi, B. J., Singh, D., & De Neve, W. (2022). Towards Multimodal Prediction of Time-continuous Emotion using Pose Feature Engineering and a Transformer Encoder. In MuSe 2022 - Proceedings of the 3rd International Multimodal Sentiment Analysis Workshop and Challenge (pp. 47–54). Association for Computing Machinery, Inc. https://doi.org/10.1145/3551876.3554807

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free