A Survey on Attention-Based Models for Image Captioning

6Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Image captioning task is highly used in many real-world applications. The captioning task is concerned with understanding the image using computer vision methods. Then, natural language processing methods are used to produce a description for the image. Different approaches were proposed to solve this task, and deep learning attention-based models have been proven to be the state-of-the-art. A survey on attention-based models for image captioning is presented in this paper including new categories that were not included in other survey papers. The attention-based approaches are classified into four main categories, further classified into subcategories. All categories and subcategories of the attention-based approaches are discussed in detail. Furthermore, the state-of-the-art approaches are compared and the accuracy improvements are stated especially in the transformer-based models, and a summary of the benchmark datasets and the main performance metrics is presented

Cite

CITATION STYLE

APA

Osman, A. A. E., Shalaby, M. A. W., Soliman, M. M., & Elsayed, K. M. (2023). A Survey on Attention-Based Models for Image Captioning. International Journal of Advanced Computer Science and Applications, 14(2), 403–412. https://doi.org/10.14569/IJACSA.2023.0140249

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free