Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

26Citations
Citations of this article
119Readers
Mendeley users who have this article in their library.

Abstract

Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

Cite

CITATION STYLE

APA

Yang, A., Liu, K., Liu, J., Lyu, Y., & Li, S. (2018). Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 98–104). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-2611

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free