Collecting reliable human judgements on machine-generated language: The case of the QG-stec data

Keith Godwin; Paul Piwek

Conference Proceedings

Collecting reliable human judgements on machine-generated language: The case of the QG-stec data

INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (2016) 212-216

DOI: 10.18653/v1/w16-6634

6Citations

76Readers

Get full text

Abstract

Question generation (QG) is the problem of automatically generating questions from inputs such as declarative sentences. The Shared Evaluation Task Challenge (QG-STEC) Task B that took place in 2010 evaluated several state-of-the-art QG systems. However, analysis of the evaluation results was affected by low inter-rater reliability. We adapted Nonaka & Takeuchi's knowledge creation cycle to the task of improving the evaluation annotation guidelines with a preliminary test showing clearly improved inter-rater reliability.

Cite

CITATION STYLE

APA

Godwin, K., & Piwek, P. (2016). Collecting reliable human judgements on machine-generated language: The case of the QG-stec data. In INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (pp. 212–216). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6634

Collecting reliable human judgements on machine-generated language: The case of the QG-stec data

Abstract

Cite

Register to see more suggestions