Attention-Based Deep Learning Model for Image Captioning: A Comparative Study

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Image captioning is the description generated from images. Generating the caption of an image is one part of computer vision or image processing from artificial intelligence (AI). Image captioning is also the bridge between the vision process and natural language process. In image captioning, there are two parts: sentence based generation and single word generation. Deep Learning has become the main driver of many new applications and is also much more accessible in terms of the learning curve. Image captioning by applying deep learning model can enhance the description accuracy. Attention mechanisms are the upward trend in the model of deep learning for image caption generation. This paper proposes the comparative study for attention-based deep learning model for image captioning. This presents the basic analyzing techniques for performance, advantages, and weakness. This also discusses the datasets for image captioning and the evaluation metrics to test the accuracy.

Cite

CITATION STYLE

APA

Khaing, P. P., & Yu, M. T. (2019). Attention-Based Deep Learning Model for Image Captioning: A Comparative Study. International Journal of Image, Graphics and Signal Processing, 11(6), 1–8. https://doi.org/10.5815/ijigsp.2019.06.01

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free