ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets

Michel Galley; Chris Brockett; Alessandro Sordoni; Yangfeng Ji; Michael Aim; Chris Quirk; Margaret Mitchell; Jianfeng Gao; Bill Dolan

Conference Proceedings

ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015) 2 445-450

DOI: 10.3115/v1/P15-2073

89Citations

203Readers

Get full text

Abstract

We introduce Discriminative BLEU (ΔABLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs. Reference strings are scored for quality by human raters on a scale of [-1, +1] to weight multi-reference bleu. In tasks involving generation of conversational responses, ΔBLEU correlates reasonably with human judgments and outperforms sentence-level and IBM Bleu in terms of both Spearman's p and Kendall's τ.

Cite

CITATION STYLE

APA

Galley, M., Brockett, C., Sordoni, A., Ji, Y., Aim, M., Quirk, C., … Dolan, B. (2015). ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 445–450). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/P15-2073

ΔbLEU: A discriminative metric for generation tasks with intrinsically diverse targets

Abstract

Cite

Register to see more suggestions