Adaptive dynamic fusion of multi-modality features for enhanced image representation

Wei Xiao; Jie Chen; Chao Pan; Tao Wang; Lei Jiang

Journal Article

Adaptive dynamic fusion of multi-modality features for enhanced image representation

Visual Computer (2025) 41(12) 10055-10067

DOI: 10.1007/s00371-025-04021-5

0Citations

1Readers

Get full text

Abstract

Fusion of multiple modalities images integrates complementary information from different sensors, creating a richer and more comprehensive representation. The traditional fusion methods adopt element-by-element addition and feature channel connection, often fail to fully fusion crucial information. To address these limitations, we propose a novel model based on vision transformers and an adaptive feature fusion network. Our model includes a multi-level feature decoupling layer to separate global and modality-specific features, combined with an attention-based adaptive dynamic fusion strategy. This strategy dynamically weights features based on their importance, enabling effective cross-modal fusion. Extensive experiments show our model’s superior performance, particularly in infrared-visible fusion, with significant improvements in metrics like mutual information (MI). Our approach not only preserves information from source images, but also produces fused images with high contrast and clear texture details. The results indicate the potential of our model in various applications, including military surveillance, remote sensing, and object detection. The code is available at https://github.com/jiejie2-code/ADF.git.

Author supplied keywords

Cite

CITATION STYLE

APA

Xiao, W., Chen, J., Pan, C., Wang, T., & Jiang, L. (2025). Adaptive dynamic fusion of multi-modality features for enhanced image representation. Visual Computer, 41(12), 10055–10067. https://doi.org/10.1007/s00371-025-04021-5

Adaptive dynamic fusion of multi-modality features for enhanced image representation

Abstract

Author supplied keywords

Cite

Register to see more suggestions