Reo-relevance, extraness, omission: A Fine-grained Evaluation for Image Captioning

8Citations
Citations of this article
91Readers
Mendeley users who have this article in their library.

Abstract

Popular metrics used for evaluating image captioning systems, such as BLEU and CIDEr, provide a single score to gauge the system's overall effectiveness. This score is often not informative enough to indicate what specific errors are made by a given system. In this study, we present a fine-grained evaluation method REO for automatically measuring the performance of image captioning systems. REO assesses the quality of captions from three perspectives: 1) Relevance to the ground truth, 2) Extraness of the content that is irrelevant to the ground truth, and 3) Omission of the elements in the images and human references. Experiments on three benchmark datasets demonstrate that our method achieves a higher consistency with human judgments and provides more intuitive evaluation results than alternative metrics.

Cite

CITATION STYLE

APA

Jiang, M., Hu, J., Huang, Q., Zhang, L., Diesner, J., & Gao, J. (2019). Reo-relevance, extraness, omission: A Fine-grained Evaluation for Image Captioning. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 1475–1480). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1156

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free