Multimodal Fusion Remote Sensing Image-Audio Retrieval

26Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Remote sensing image-audio retrieval (RSIAR) has been an emerging research topic in recent years, and many different methods have been proposed for this topic. These RSIAR methods have achieved good retrieval results, but two problems remain: the lack of discriminability of audio modality and the existence of a heterogeneous gap between audio and image. These two problems make the cross-modal common embedding space for audio and images suboptimal, often failing to perform superior retrieval. This article proposes a novel RSIAR method named multimodal fusion remote sensing image-audio retrieval (MMFR) to address these two problems. MMFR first converts original audio input to text. Then, MMFR uses a feature fusion module to obtain a fusion representation fused with text information instead of the original sole audio representation. Fusion text information can make the pronunciation-based audio feature more semantically discriminable and convert pronunciation-based audio feature to more 'high-level' fusion feature to cross the heterogeneous gap. Seven different fusion methods are tried in the feature fusion module. In addition, the triplet loss, the semantic loss, and the consistency loss are used to optimize the common retrieval space. Extensive experiments conducted on the UCM_IV, RSICD_IV, and SYDNE_IV datasets demonstrate that our MMFR method outperforms state-of-the-art methods.

Cite

CITATION STYLE

APA

Yang, R., Wang, S., Sun, Y., Zhang, H., Liao, Y., Gu, Y., … Jiao, L. (2022). Multimodal Fusion Remote Sensing Image-Audio Retrieval. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 6220–6235. https://doi.org/10.1109/JSTARS.2022.3194076

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free