Neural Image Caption Generation with Weighted Training and Reference

Guiguang Ding; Minghai Chen; Sicheng Zhao; Hui Chen; Jungong Han; Qiang Liu

Journal ArticleOPEN ACCESS

Neural Image Caption Generation with Weighted Training and Reference

Cognitive Computation (2019) 11(6) 763-777

DOI: 10.1007/s12559-018-9581-x

43Citations

56Readers

Abstract

Image captioning, which aims to automatically generate a sentence description for an image, has attracted much research attention in cognitive computing. The task is rather challenging, since it requires cognitively combining the techniques from both computer vision and natural language processing domains. Existing CNN-RNN framework-based methods suffer from two main problems: in the training phase, all the words of captions are treated equally without considering the importance of different words; in the caption generation phase, the semantic objects or scenes might be misrecognized. In our paper, we propose a method based on the encoder-decoder framework, named Reference based Long Short Term Memory (R-LSTM), aiming to lead the model to generate a more descriptive sentence for the given image by introducing reference information. Specifically, we assign different weights to the words according to the correlation between words and images during the training phase. We additionally maximize the consensus score between the captions generated by the captioning model and the reference information from the neighboring images of the target image, which can reduce the misrecognition problem. We have conducted extensive experiments and comparisons on the benchmark datasets MS COCO and Flickr30k. The results show that the proposed approach can outperform the state-of-the-art approaches on all metrics, especially achieving a 10.37% improvement in terms of CIDEr on MS COCO. By analyzing the quality of the generated captions, we come to a conclusion that through the introduction of reference information, our model can learn the key information of images and generate more trivial and relevant words for images.

Author supplied keywords

Cite

CITATION STYLE

APA

Ding, G., Chen, M., Zhao, S., Chen, H., Han, J., & Liu, Q. (2019). Neural Image Caption Generation with Weighted Training and Reference. Cognitive Computation, 11(6), 763–777. https://doi.org/10.1007/s12559-018-9581-x

Neural Image Caption Generation with Weighted Training and Reference

Abstract

Author supplied keywords

Cite

Register to see more suggestions