Image captioning is the description generated from images. Generating the caption of an image is one part of computer vision or image processing from artificial intelligence (AI). Image captioning is also the bridge between the vision process and natural language process. In image captioning, there are two parts: sentence based generation and single word generation. Deep Learning has become the main driver of many new applications and is also much more accessible in terms of the learning curve. Image captioning by applying deep learning model can enhance the description accuracy. Attention mechanisms are the upward trend in the model of deep learning for image caption generation. This paper proposes the comparative study for attention-based deep learning model for image captioning. This presents the basic analyzing techniques for performance, advantages, and weakness. This also discusses the datasets for image captioning and the evaluation metrics to test the accuracy.
CITATION STYLE
Khaing, P. P., & Yu, M. T. (2019). Attention-Based Deep Learning Model for Image Captioning: A Comparative Study. International Journal of Image, Graphics and Signal Processing, 11(6), 1–8. https://doi.org/10.5815/ijigsp.2019.06.01
Mendeley helps you to discover research relevant for your work.