QACE: Asking Questions to Evaluate an Image Caption

8Citations
Citations of this article
49Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. We first develop QACERef that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACEImg, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACEImg. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACEImg is multi-modal, reference-less, and explainable. Our experiments show that QACEImg compares favorably w.r.t. other reference-less metrics. We will release the pre-trained models to compute QACE.

Cite

CITATION STYLE

APA

Lee, H., Scialom, T., Yoon, S., Dernoncourt, F., & Jung, K. (2021). QACE: Asking Questions to Evaluate an Image Caption. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4631–4638). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.395

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free