Abstract
Introduction: Physiological signals offer a significant advantage in the field of emotion recognition due to their objective nature, as they are less susceptible to volitional control and thus provide a more veridical reflection of an individual's true affective state. The use of multimodal physiological signals enables a more holistic characterization of emotions, establishing multimodal emotion recognition as a critical area of research. However, existing multimodal fusion methods often fail to capture the complex, dynamic interactions and correlations between different modalities. Consequently, they exhibit limitations in fully leveraging complementary information from other physiological signals during the feature learning process. Methods: To address these shortcomings, we propose a novel framework for multimodal physiological emotion recognition. This framework is designed to comprehensively learn and extract features from multiple modalities simultaneously, effectively simulating the integrative process of human emotion perception. It utilizes a dual-branch representation learning architecture to process electroencephalography (EEG) and peripheral signals separately, providing high-quality inputs for subsequent feature fusion. Furthermore, we employ a cross attention mechanism tailored for multimodal signals to fully exploit the richness and complementarity of the information. This approach not only improves the accuracy of emotion recognition but also enhances robustness against issues such as missing modalities and noise, thereby achieving precise classification of emotions from multimodal signals. Results: Experimental results on the public DEAP and SEED-IV multimodal physiological signal datasets confirm that our proposed model demonstrates superior performance in the emotion classification task compared to other state-of-the-art models. Our findings prove that the proposed model can effectively extract and fuse features from multimodal physiological signals. Discussion: These results underscore the potential of our model in the domain of affective computing and hold significant implications for research in healthcare and human-computer interaction.
Author supplied keywords
Cite
CITATION STYLE
Ding, S., Ma, L., & Li, H. (2025). Multimodal physiological signal emotion recognition based on multi-head cross attention with representation learning. Frontiers in Psychiatry, 16. https://doi.org/10.3389/fpsyt.2025.1713559
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.