Intra-image region context for image captioning

Shihao Wang; Hong Mo; Yue Xu; Wei Wu; Zhong Zhou

Conference Proceedings

Intra-image region context for image captioning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11166 LNCS 212-222

DOI: 10.1007/978-3-030-00764-5_20

0Citations

1Readers

Get full text

Abstract

Image captioning is a challenging task involving computer vision and natural language processing. In recent works, visual attention mechanisms have been extensively used. However, they consider little about the correlations among different regions and the attention on regions. This paper is try to make up for the deficiencies in existing approaches and propose a novel captioning model, which extracts the salient region correlations from the image feature, synthesizes intra-image regions’ context, and automatically distributes an appropriate attention over regions. The Intra-Image Region Context (IIRC) model proposed in this paper jointly learns regions’ semantic correlations in one image. It consists of two main parts. The first is to extract feature vectors of image through convolutional neural work (CNN) and get correlations among regions from feature vectors by recurrent neural network (RNN). The second is to generate the caption according to the synthesis of region contexts from the first network with attention on different region contexts. The model and baseline are evaluated on MSCOCO test server. The experimental results have illustrated that the model is superior over many outstanding models on the metrics of BLEU, METEOR, ROUGE-L and CIDEr. Moreover, the model excels in describing details, especially those related to position and action.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, S., Mo, H., Xu, Y., Wu, W., & Zhou, Z. (2018). Intra-image region context for image captioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11166 LNCS, pp. 212–222). Springer Verlag. https://doi.org/10.1007/978-3-030-00764-5_20

Intra-image region context for image captioning

Abstract

Author supplied keywords

Cite

Register to see more suggestions