RGB-T tracking leverages the fusion of visible (RGB) and thermal (T) modalities to achieve more robust object tracking. Existing popular RGB-T trackers often fail to fully leverage background information and complementary information from different modalities. To address these issues, we propose the target-aware enhanced fusion network (TEFNet). TEFNet concatenates the features of template and search regions from each modality and then utilizes self-attention operations to enhance the single-modality features for the target by discriminating it from the background. Additionally, a background elimination module is introduced to reduce the background regions. To further fuse the complementary information across different modalities, a dual-layer fusion module based on channel attention, self-attention, and bidirectional cross-attention is constructed. This module diminishes the feature information of the inferior modality, and amplifies the feature information of the dominant modality, effectively eliminating the adverse effects caused by modality differences. Experimental results on the LasHeR and VTUAV datasets demonstrate that our method outperforms other representative RGB-T tracking approaches, with significant improvements of 6.6% and 7.1% in MPR and MSR on the VTUAV dataset respectively.
CITATION STYLE
Chen, P., Gong, S., Ying, W., Du, X., & Zhong, S. (2024). TEFNet: Target-Aware Enhanced Fusion Network for RGB-T Tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14434 LNCS, pp. 432–443). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8549-4_36
Mendeley helps you to discover research relevant for your work.