Non-autoregressive image captioning with counterfactuals-critical multi-agent learning

Longteng Guo; Jing Liu; Xinxin Zhu; Xingjian He; Jie Jiang; Hanqing Lu

Conference Proceedings

Non-autoregressive image captioning with counterfactuals-critical multi-agent learning

IJCAI International Joint Conference on Artificial Intelligence (2020) 2021-January 767-773

DOI: 10.24963/ijcai.2020/107

32Citations

42Readers

Get full text

Abstract

Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9× decoding speedup.

Cite

CITATION STYLE

APA

Guo, L., Liu, J., Zhu, X., He, X., Jiang, J., & Lu, H. (2020). Non-autoregressive image captioning with counterfactuals-critical multi-agent learning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 767–773). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/107

Non-autoregressive image captioning with counterfactuals-critical multi-agent learning

Abstract

Cite

Register to see more suggestions