Ground truth for grammatical error correction metrics

Courtney Napoles; Keisuke Sakaguchi; Matt Post; Joel Tetreault

Conference ProceedingsOPEN ACCESS

Ground truth for grammatical error correction metrics

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015) 2 588-593

DOI: 10.3115/v1/P15-2097

181Citations

169Readers

Abstract

How do we know which grammatical error correction (GEC) system is best? A number of metrics have been proposed over the years, each motivated by weaknesses of previous metrics; however, the metrics themselves have not been compared to an empirical gold standard grounded in human judgments. We conducted the first human evaluation of GEC system outputs, and show that the rankings produced by metrics such as MaxMatch and I-measure do not correlate well with this ground truth. As a step towards better metrics, we also propose GLEU, a simple variant of BLEU, modified to account for both the source and the reference, and show that it hews much more closely to human judgments.

Cite

CITATION STYLE

APA

Napoles, C., Sakaguchi, K., Post, M., & Tetreault, J. (2015). Ground truth for grammatical error correction metrics. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 588–593). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/P15-2097

Ground truth for grammatical error correction metrics

Abstract

Cite

Register to see more suggestions