Guideline Bias in Wizard-of-Oz Dialogues

3Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

Abstract

NLP models struggle with generalization due to sampling and annotator bias. This paper focuses on a different kind of bias that has received very little attention: guideline bias, i.e., the bias introduced by how our annotator guidelines are formulated. We examine two recently introduced dialogue datasets, CCPE-M and Taskmaster-1, both collected by trained assistants in a Wizard-of-Oz set-up. For CCPE-M, we show how a simple lexical bias for the word like in the guidelines biases the data collection. This bias, in effect, leads to poor performance on data without this bias: a preference elicitation architecture based on BERT suffers a 5.3% absolute drop in performance, when like is replaced with a synonymous phrase, and a 13.2% drop in performance when evaluated on out-of-sample data. For Taskmaster-1, we show how the order in which instructions are presented, biases the data collection.

Cite

CITATION STYLE

APA

Bach Hansen, V. P., & Søgaard, A. (2021). Guideline Bias in Wizard-of-Oz Dialogues. In BPPF 2021 - 1st Workshop on Benchmarking: Past, Present and Future, Proceedings (pp. 8–14). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.bppf-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free