Deep unsupervised embedding for remote sensing image retrieval using textual cues

21Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Compared to image-image retrieval, text-image retrieval has been less investigated in the remote sensing community, possibly because of the complexity of appropriately tying textual data to respective visual representations. Moreover, a single image may be described via multiple sentences according to the perception of the human labeler and the structure/body of the language they use, which magnifies the complexity even further. In this paper, we propose an unsupervised method for text-image retrieval in remote sensing imagery. In the method, image representation is obtained via visual Big Transfer (BiT) Models, while textual descriptions are encoded via a bidirectional Long Short-Term Memory (Bi-LSTM) network. The training of the proposed retrieval architecture is optimized using an unsupervised embedding loss, which aims to make the features of an image closest to its corresponding textual description and different from other image features and vise-versa. To demonstrate the performance of the proposed architecture, experiments are performed on two datasets, obtaining plausible text/image retrieval outcomes.

Cite

CITATION STYLE

APA

Al Rahhal, M. M., Bazi, Y., Abdullah, T., Mekhalfi, M. L., & Zuair, M. (2020). Deep unsupervised embedding for remote sensing image retrieval using textual cues. Applied Sciences (Switzerland), 10(24), 1–14. https://doi.org/10.3390/app10248931

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free