Self-supervised representation learning using multimodal Transformer for emotion recognition

1Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we present a Modality-Agnostic Transformer based Self-Supervised Learning (MATS2L) for emotion recognition using physiological signals. The proposed approach consists of two stages: a) Pretext stage, where the transformer model is pre-trained with unlabeled physiological signal data using masked signal prediction as pre-training task and form contextualized signal representations. b) Downstream stage, where self-supervised learning (SSL) representations extracted from a pre-trained model are utilized for emotion recognition tasks. Modality-agnostic approach allows the transformer model to focus on exploring mutual features among different physiological signals and learning more meaningful embeddings to estimate emotions effectively. We conduct several experiments on a public dataset WESAD and perform comparisons with fully supervised and other competitive SSL approaches. Experimental results showed that the proposed approach is capable of learning meaningful features and superior to other competitive SSL approaches. Moreover, a transformer model trained on SSL features outperforms fully supervised transformer model. We also present detailed ablation studies to prove the robustness of our approach.

Cite

CITATION STYLE

APA

Goetz, T., Arora, P., Erick, F. X., Holzer, N., & Sawant, S. (2023). Self-supervised representation learning using multimodal Transformer for emotion recognition. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3615834.3615837

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free