Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

Yan Chu; Xiao Yue; Lei Yu; Mikhailov Sergei; Zhengkui Wang

Journal ArticleOPEN ACCESS

Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

Wireless Communications and Mobile Computing (2020) 2020

DOI: 10.1155/2020/8909458

81Citations

84Readers

Abstract

Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

Cite

CITATION STYLE

APA

Chu, Y., Yue, X., Yu, L., Sergei, M., & Wang, Z. (2020). Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention. Wireless Communications and Mobile Computing, 2020. https://doi.org/10.1155/2020/8909458

Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

Abstract

Cite

Register to see more suggestions