Compositional generalization in image captioning

Mitja Nikolaus; Mostafa Abdou; Matthew Lamm; Rahul Aralikatte; Desmond Elliott

Conference ProceedingsOPEN ACCESS

Compositional generalization in image captioning

CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (2019) 87-98

DOI: 10.18653/v1/k19-1009

25Citations

117Readers

Abstract

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image-sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

Cite

CITATION STYLE

APA

Nikolaus, M., Abdou, M., Lamm, M., Aralikatte, R., & Elliott, D. (2019). Compositional generalization in image captioning. In CoNLL 2019 - 23rd Conference on Computational Natural Language Learning, Proceedings of the Conference (pp. 87–98). Association for Computational Linguistics. https://doi.org/10.18653/v1/k19-1009

Compositional generalization in image captioning

Abstract

Cite

Register to see more suggestions