Image Captioning and Comparison of Different Encoders

7Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Generation of a sentence given an image, called image captioning, has been one of the most intriguing topics in computer vision. It incorporates knowledge of both image processing and natural language processing. Most of the current approaches integrates the concepts of neural network. Different predefined convolutional neural network (CNN) models are used for extracting features from an image and uni-directional or bi-directional recurrent neural network (RNN) for language modelling. This paper discusses about the commonly used models that are used as image encoder, such as Inception-V3, VGG19, VGG16 and InceptionResNetV2 while using the uni-directional LSTMs for the text generation. Further, the comparative analysis of the result has been obtained using the Bilingual Evaluation Understudy (BLEU) score on the Flickr8k dataset.

Cite

CITATION STYLE

APA

Pal, A., Kar, S., Taneja, A., & Jadoun, V. K. (2020). Image Captioning and Comparison of Different Encoders. In Journal of Physics: Conference Series (Vol. 1478). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1478/1/012004

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free