Disfluencies in user utterances can trigger a chain of errors impacting all the modules of a dialogue system: natural language understanding, dialogue state tracking, and response generation. In this work, we first analyze existing dialogue datasets commonly used in research and show that they only contain a marginal number of disfluent utterances. Due to this relative absence of disfluencies in their training data, dialogue systems may then critically fail when exposed to disfluent utterances. Following this observation, we propose to augment existing datasets with disfluent user utterances by paraphrasing fluent utterances into disfluent ones. Relying on a pre-trained language model, our few-shot disfluent paraphraser guided by a disfluency classifier can generate useful disfluent utterances for training better dialogue systems. We report on improvements for both dialogue state tracking and response generation when the dialogue systems are trained on datasets augmented with our disfluent utterances.
CITATION STYLE
Marie, B. (2023). Disfluency Generation for More Robust Dialogue Systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 11479–11488). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.728
Mendeley helps you to discover research relevant for your work.