X-Norm: Exchanging Normalization Parameters for Bimodal Fusion

Yufeng Yin; Jiashu Xu; Tianxin Zu; Mohammad Soleymani

Conference ProceedingsOPEN ACCESS

X-Norm: Exchanging Normalization Parameters for Bimodal Fusion

Yin Y
Xu J
Zu T
et al.

ACM International Conference Proceeding Series (2022) 605-614

DOI: 10.1145/3536221.3556581

1Citations

5Readers

Abstract

Multimodal learning aims to process and relate information from different modalities to enhance the model's capacity for perception. Current multimodal fusion mechanisms either do not align the feature spaces closely or are expensive for training and inference. In this paper, we present X-Norm, a novel, simple and efficient method for bimodal fusion that generates and exchanges limited but meaningful normalization parameters between the modalities implicitly aligning the feature spaces. We conduct extensive experiments on two tasks of emotion and action recognition with different architectures including Transformer-based and CNN-based models using IEMOCAP and MSP-IMPROV for emotion recognition and EPIC-KITCHENS for action recognition. The experimental results show that X-Norm achieves comparable or superior performance compared to the existing methods including early and late fusion, Gradient-Blending (G-Blend) [44], Tensor Fusion Network, [48] and Multimodal Transformer [40], with a relatively low training cost.

Author supplied keywords

Cite

CITATION STYLE

APA

Yin, Y., Xu, J., Zu, T., & Soleymani, M. (2022). X-Norm: Exchanging Normalization Parameters for Bimodal Fusion. In ACM International Conference Proceeding Series (pp. 605–614). Association for Computing Machinery. https://doi.org/10.1145/3536221.3556581

X-Norm: Exchanging Normalization Parameters for Bimodal Fusion

Abstract

Author supplied keywords

Cite

Register to see more suggestions