Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Nowadays, image–text matching (retrieval) has frequently attracted attention due to the growth of multimodal data. This task returns the relevant images to a textual query or descriptions that describe a visual scene and vice versa. The core challenge is how to precisely determine the similarity computation between the text and image, which requires understanding the different modalities by extracting the related information accurately. Although many approaches are established for matching textual data and visual content utilizing deep learning (DL) approaches, a few reviews of the studies of image–text matching are obtainable using DL. In this review study, we contribute to present and clarify the modern techniques based on DL in the image–text matching problem by providing an extensive study of the existing matching models, different current architectures, benchmark datasets, and evaluation methods. First, we explain the matching task and illustrate frequently used architecture. Second, we classify present approaches according to two important concepts the alignment between image and text, and the learning approach. Third, we report standard datasets and evaluation techniques. Finally, we show up current challenges to serve as an inspiration to new researchers in this field.

References Powered by Scopus

Deep residual learning for image recognition

175065Citations
N/AReaders
Get full text

Long Short-Term Memory

77222Citations
N/AReaders
Get full text

Gradient-based learning applied to document recognition

44254Citations
N/AReaders
Get full text

Cited by Powered by Scopus

SCT: Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary

4Citations
N/AReaders
Get full text

Federated training of GNNs with similarity graph reasoning for text–image retrieval

0Citations
N/AReaders
Get full text

Graphic association learning: Multimodal feature extraction and fusion of image and text using artificial intelligence techniques

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Ebaid, D. B., Madbouly, M. M., & El-Zoghabi, A. A. (2023, December 1). Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges. International Journal of Computational Intelligence Systems. Springer Science and Business Media B.V. https://doi.org/10.1007/s44196-023-00260-3

Readers over time

‘23‘24036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

67%

Researcher 1

33%

Readers' Discipline

Tooltip

Computer Science 3

100%

Save time finding and organizing research with Mendeley

Sign up for free
0