Correlating human and automatic evaluation of a German surface realiser

Aoife Cahill

Conference Proceedings

Correlating human and automatic evaluation of a German surface realiser

Cahill A

ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (2009) 97-100

DOI: 10.3115/1667583.1667615

15Citations

97Readers

Get full text

Abstract

We examine correlations between native speaker judgements on automatically generated German text against automatic evaluation metrics. We look at a number of metrics from the MT and Summarisation communities and find that for a relative ranking task, most automatic metrics perform equally well and have fairly strong correlations to the human judgements. In contrast, on a naturalness judgement task, the General Text Matcher (GTM) tool correlates best overall, although in general, correlation between the human judgements and the automatic metrics was quite weak. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Cahill, A. (2009). Correlating human and automatic evaluation of a German surface realiser. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 97–100). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1667583.1667615

Correlating human and automatic evaluation of a German surface realiser

Abstract

Cite

Register to see more suggestions