A Survey on Attention-Based Models for Image Captioning

Asmaa A.E. Osman; Mohamed A.Wahby Shalaby; Mona M. Soliman; Khaled M. Elsayed

Journal ArticleOPEN ACCESS

A Survey on Attention-Based Models for Image Captioning

International Journal of Advanced Computer Science and Applications (2023) 14(2) 403-412

DOI: 10.14569/IJACSA.2023.0140249

6Citations

15Readers

Abstract

Image captioning task is highly used in many real-world applications. The captioning task is concerned with understanding the image using computer vision methods. Then, natural language processing methods are used to produce a description for the image. Different approaches were proposed to solve this task, and deep learning attention-based models have been proven to be the state-of-the-art. A survey on attention-based models for image captioning is presented in this paper including new categories that were not included in other survey papers. The attention-based approaches are classified into four main categories, further classified into subcategories. All categories and subcategories of the attention-based approaches are discussed in detail. Furthermore, the state-of-the-art approaches are compared and the accuracy improvements are stated especially in the transformer-based models, and a summary of the benchmark datasets and the main performance metrics is presented

Author supplied keywords

Cite

CITATION STYLE

APA

Osman, A. A. E., Shalaby, M. A. W., Soliman, M. M., & Elsayed, K. M. (2023). A Survey on Attention-Based Models for Image Captioning. International Journal of Advanced Computer Science and Applications, 14(2), 403–412. https://doi.org/10.14569/IJACSA.2023.0140249

A Survey on Attention-Based Models for Image Captioning

Abstract

Author supplied keywords

Cite

Register to see more suggestions