Multi-level policy and reward reinforcement learning for image captioning

28Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flick-r30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

Cite

CITATION STYLE

APA

Liu, A. A., Xu, N., Zhang, H., Nie, W., Su, Y., & Zhang, Y. (2018). Multi-level policy and reward reinforcement learning for image captioning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 821–827). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/114

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free