Crowd-sourcing NLG data: Pictures elicit better data

Jekaterina Novikova; Oliver Lemon; Verena Rieser

Conference ProceedingsOPEN ACCESS

Crowd-sourcing NLG data: Pictures elicit better data

INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (2016) 265-273

DOI: 10.18653/v1/w16-6644

48Citations

113Readers

Abstract

Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using automatic quality control measures and evaluating different MRs with which to elicit data. We show that pictorial MRs result in better NL data being collected than logicbased MRs: utterances elicited by pictorial MRs are judged as significantly more natural, more informative, and better phrased, with a significant increase in average quality ratings (around 0.5 points on a 6-point scale), compared to using the logical MRs. As the MR becomes more complex, the benefits of pictorial stimuli increase. The collected data will be released as part of this submission.

Cite

CITATION STYLE

APA

Novikova, J., Lemon, O., & Rieser, V. (2016). Crowd-sourcing NLG data: Pictures elicit better data. In INLG 2016 - 9th International Natural Language Generation Conference, Proceedings of the Conference (pp. 265–273). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-6644

Crowd-sourcing NLG data: Pictures elicit better data

Abstract

Cite

Register to see more suggestions