ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets

89Citations
Citations of this article
203Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We introduce Discriminative BLEU (ΔABLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [-1, +1] to weight multi-reference bleu. In tasks involving generation of conversational responses, ΔBLEU correlates reasonably with human judgments and outperforms sentence-level and IBM Bleu in terms of both Spearman's p and Kendall's τ.

Cite

CITATION STYLE

APA

Galley, M., Brockett, C., Sordoni, A., Ji, Y., Aim, M., Quirk, C., … Dolan, B. (2015). ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 445–450). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/P15-2073

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free