Self-supervised representation learning using multimodal Transformer for emotion recognition

Theresa Goetz; Pulkit Arora; F. X. Erick; Nina Holzer; Shrutika Sawant

Conference ProceedingsOPEN ACCESS

Self-supervised representation learning using multimodal Transformer for emotion recognition

ACM International Conference Proceeding Series (2023)

DOI: 10.1145/3615834.3615837

3Citations

23Readers

Get full text

Abstract

In this paper, we present a Modality-Agnostic Transformer based Self-Supervised Learning (MATS2L) for emotion recognition using physiological signals. The proposed approach consists of two stages: a) Pretext stage, where the transformer model is pre-trained with unlabeled physiological signal data using masked signal prediction as pre-training task and form contextualized signal representations. b) Downstream stage, where self-supervised learning (SSL) representations extracted from a pre-trained model are utilized for emotion recognition tasks. Modality-agnostic approach allows the transformer model to focus on exploring mutual features among different physiological signals and learning more meaningful embeddings to estimate emotions effectively. We conduct several experiments on a public dataset WESAD and perform comparisons with fully supervised and other competitive SSL approaches. Experimental results showed that the proposed approach is capable of learning meaningful features and superior to other competitive SSL approaches. Moreover, a transformer model trained on SSL features outperforms fully supervised transformer model. We also present detailed ablation studies to prove the robustness of our approach.

Author supplied keywords

Cite

CITATION STYLE

APA

Goetz, T., Arora, P., Erick, F. X., Holzer, N., & Sawant, S. (2023). Self-supervised representation learning using multimodal Transformer for emotion recognition. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3615834.3615837

Self-supervised representation learning using multimodal Transformer for emotion recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions