In this paper, we present a Modality-Agnostic Transformer based Self-Supervised Learning (MATS2L) for emotion recognition using physiological signals. The proposed approach consists of two stages: a) Pretext stage, where the transformer model is pre-trained with unlabeled physiological signal data using masked signal prediction as pre-training task and form contextualized signal representations. b) Downstream stage, where self-supervised learning (SSL) representations extracted from a pre-trained model are utilized for emotion recognition tasks. Modality-agnostic approach allows the transformer model to focus on exploring mutual features among different physiological signals and learning more meaningful embeddings to estimate emotions effectively. We conduct several experiments on a public dataset WESAD and perform comparisons with fully supervised and other competitive SSL approaches. Experimental results showed that the proposed approach is capable of learning meaningful features and superior to other competitive SSL approaches. Moreover, a transformer model trained on SSL features outperforms fully supervised transformer model. We also present detailed ablation studies to prove the robustness of our approach.
CITATION STYLE
Goetz, T., Arora, P., Erick, F. X., Holzer, N., & Sawant, S. (2023). Self-supervised representation learning using multimodal Transformer for emotion recognition. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3615834.3615837
Mendeley helps you to discover research relevant for your work.