Multiple-level feature-based network for image captioning

Kaidi Zheng; Chen Zhu; Shaopeng Lu; Yonggang Liu

Conference Proceedings

Multiple-level feature-based network for image captioning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11164 LNCS 94-103

DOI: 10.1007/978-3-030-00776-8_9

1Citations

3Readers

Get full text

Abstract

Image captioning, which automatically describes the content of an image, has attracted interests recently. Due to the need for both fine-grained visual understanding and meaningful natural language expression, image captioning is a challenging task. Existing methods predominantly take one kind of image feature to generate the description while neglecting other useful features. This strategy leads to unsatisfied captioning result. To deal with this problem, we propose a multiple-level feature-based network for image captioning. In our method, three kinds of features are extracted from the image, representing analysis of different level of the image. Attention mechanism in our network is adopted to selectively attend to salient region or attribute of each feature when predicting each word of the caption. Experimental results show that our model can outperform the state-of-the-art methods on MS-COCO dataset. Compared with other methods, our network can lead to more accurate subject prediction and vivid description of sentences.

Author supplied keywords

Cite

CITATION STYLE

APA

Zheng, K., Zhu, C., Lu, S., & Liu, Y. (2018). Multiple-level feature-based network for image captioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11164 LNCS, pp. 94–103). Springer Verlag. https://doi.org/10.1007/978-3-030-00776-8_9

Multiple-level feature-based network for image captioning

Abstract

Author supplied keywords

Cite

Register to see more suggestions