Multiple-level feature-based network for image captioning

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Image captioning, which automatically describes the content of an image, has attracted interests recently. Due to the need for both fine-grained visual understanding and meaningful natural language expression, image captioning is a challenging task. Existing methods predominantly take one kind of image feature to generate the description while neglecting other useful features. This strategy leads to unsatisfied captioning result. To deal with this problem, we propose a multiple-level feature-based network for image captioning. In our method, three kinds of features are extracted from the image, representing analysis of different level of the image. Attention mechanism in our network is adopted to selectively attend to salient region or attribute of each feature when predicting each word of the caption. Experimental results show that our model can outperform the state-of-the-art methods on MS-COCO dataset. Compared with other methods, our network can lead to more accurate subject prediction and vivid description of sentences.

Cite

CITATION STYLE

APA

Zheng, K., Zhu, C., Lu, S., & Liu, Y. (2018). Multiple-level feature-based network for image captioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11164 LNCS, pp. 94–103). Springer Verlag. https://doi.org/10.1007/978-3-030-00776-8_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free