Multimodal physiological signal emotion recognition based on multi-head cross attention with representation learning

2Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Introduction: Physiological signals offer a significant advantage in the field of emotion recognition due to their objective nature, as they are less susceptible to volitional control and thus provide a more veridical reflection of an individual's true affective state. The use of multimodal physiological signals enables a more holistic characterization of emotions, establishing multimodal emotion recognition as a critical area of research. However, existing multimodal fusion methods often fail to capture the complex, dynamic interactions and correlations between different modalities. Consequently, they exhibit limitations in fully leveraging complementary information from other physiological signals during the feature learning process. Methods: To address these shortcomings, we propose a novel framework for multimodal physiological emotion recognition. This framework is designed to comprehensively learn and extract features from multiple modalities simultaneously, effectively simulating the integrative process of human emotion perception. It utilizes a dual-branch representation learning architecture to process electroencephalography (EEG) and peripheral signals separately, providing high-quality inputs for subsequent feature fusion. Furthermore, we employ a cross attention mechanism tailored for multimodal signals to fully exploit the richness and complementarity of the information. This approach not only improves the accuracy of emotion recognition but also enhances robustness against issues such as missing modalities and noise, thereby achieving precise classification of emotions from multimodal signals. Results: Experimental results on the public DEAP and SEED-IV multimodal physiological signal datasets confirm that our proposed model demonstrates superior performance in the emotion classification task compared to other state-of-the-art models. Our findings prove that the proposed model can effectively extract and fuse features from multimodal physiological signals. Discussion: These results underscore the potential of our model in the domain of affective computing and hold significant implications for research in healthcare and human-computer interaction.

Cite

CITATION STYLE

APA

Ding, S., Ma, L., & Li, H. (2025). Multimodal physiological signal emotion recognition based on multi-head cross attention with representation learning. Frontiers in Psychiatry, 16. https://doi.org/10.3389/fpsyt.2025.1713559

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free