The role of syntactic planning in compositional image captioning

6Citations
Citations of this article
76Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images. Recently, Nikolaus et al. (2019) introduced a dataset to assess compositional generalization in image captioning, where models are evaluated on their ability to describe images with unseen adjective-noun and noun-verb compositions. In this work, we investigate different methods to improve compositional generalization by planning the syntactic structure of a caption. Our experiments show that jointly modeling tokens and syntactic tags enhances generalization in both RNN- and Transformer-based models, while also improving performance on standard metrics.

Cite

CITATION STYLE

APA

Bugliarello, E., & Elliott, D. (2021). The role of syntactic planning in compositional image captioning. In EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 593–607). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.eacl-main.48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free