Multi-level policy and reward reinforcement learning for image captioning

An An Liu; Ning Xu; Hanwang Zhang; Weizhi Nie; Yuting Su; Yongdong Zhang

Conference Proceedings

Multi-level policy and reward reinforcement learning for image captioning

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 821-827

DOI: 10.24963/ijcai.2018/114

28Citations

45Readers

Get full text

Abstract

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flick-r30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

Cite

CITATION STYLE

APA

Liu, A. A., Xu, N., Zhang, H., Nie, W., Su, Y., & Zhang, Y. (2018). Multi-level policy and reward reinforcement learning for image captioning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 821–827). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/114

Multi-level policy and reward reinforcement learning for image captioning

Abstract

Cite

Register to see more suggestions