Let's not Quote out of Context": Unifed Vision-Language Pretraining for Context Assisted Image Captioning

5Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Well-formed context aware image captions and tags in enterprise content such as marketing material are critical to ensure their brand presence and content recall. Manual creation and updates to ensure the same is non trivial given the scale and the tedium towards this task. We propose a new unifed Vision-Language (VL) model based on the One For All (OFA) model, with a focus on context-assisted image caption-ing where the caption is generated based on both the image and its context. Our approach aims to overcome the context-independent (image and text are treated independently) nature of the existing approaches. We exploit context by pretraining our model with datasets of three tasks- news image captioning where the news article is the context, contextual visual entailment, and keyword extraction from the context. The second pretraining task is a new VL task, and we construct and release two datasets for the task with 1.1M and 2.2K data instances. Our system achieves state-of-the-art results with an improvement of up to 8.34 CIDEr score on the benchmark news image cap-tioning datasets. To the best of our knowledge, ours is the frst effort at incorporating contextual information in pretraining the models for the VL tasks.

Cite

CITATION STYLE

APA

Kalarani, A. R., Bhattacharyya, P., Chhaya, N., & Shekhar, S. (2023). Let’s not Quote out of Context": Unifed Vision-Language Pretraining for Context Assisted Image Captioning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 695–706). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.67

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free