Image Captioning using Convolutional Neural Networks and Long Short Term Memory Cells

  • Das H
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper discusses an efficient approach to captioning a given image using a combination of Convolutional Neural Network (CNN) and Recurrent Neural Networks (RNN) with Long Short Term Memory Cells (LSTM). Image captioning is a realm of deep learning and computer vision which deals with generating relevant captions for a given input image. The research in this area includes the hyperparameter tuning of Convolutional Neural Networks and Recurrent Neural Networks to generate captions which are as accurate as possible. The basic outline of the process includes giving an image as input to the CNN which outputs a feature map. This feature map is passed as input to the RNN which outputs a sentence describing the image. The research in image captioning is relevant because this method demonstrates the true power of the encoder-decoder network made up of Convolutional Neural Network and Recurrent Neural Network and potentially will open many pathways for further interesting research on different types of neural networks.

Cite

CITATION STYLE

APA

Das, H. (2022). Image Captioning using Convolutional Neural Networks and Long Short Term Memory Cells. International Journal of Recent Technology and Engineering (IJRTE), 11(1), 91–95. https://doi.org/10.35940/ijrte.e6741.0511122

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free