Validity-Based Sampling and Smoothing Methods for Multiple Reference Image Captioning

0Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

In image captioning, multiple captions are often provided as ground truths, since a valid caption is not always uniquely determined. Conventional methods randomly select a single caption and treat it as correct, but there have been few effective training methods that utilize multiple given captions. In this paper, we propose two training techniques for making effective use of multiple reference captions: 1) validity-based caption sampling (VBCS), which prioritizes the use of captions that are estimated to be highly valid during training, and 2) weighted caption smoothing (WCS), which applies smoothing only to the relevant words the reference caption to reflect multiple reference captions simultaneously. Experiments show that our proposed methods improve CIDEr by 2.6 points and BLEU4 by 0.9 points from baseline on the MSCOCO dataset.

Cite

CITATION STYLE

APA

Nagasawa, S., Watanabe, Y., & Iyatomi, H. (2021). Validity-Based Sampling and Smoothing Methods for Multiple Reference Image Captioning. In Multimodal Artificial Intelligence, MAI Workshop 2021 - Proceedings of the 3rd Workshop (pp. 36–41). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.maiworkshop-1.6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free