What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation

1Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Generative conversational agents are known to suffer from problems like inconsistency and hallucination, and a big challenge in studying these issues remains evaluation: they are not properly reflected in common text generation metrics like perplexity or BLEU, and alternative implicit methods like semantic similarity or NLI labels can be misguided when few specific tokens are decisive. In this work we propose ConsisTest; a factual consistency benchmark including both WH and Y/N questions based on PersonaChat, along with a hybrid evaluation pipeline which aims to get the best of symbolic and sub-symbolic methods. Using these and focusing on pretrained generative models like BART, we provide detailed analysis on how the model's factual consistency is affected by variations in question and context.

Cite

CITATION STYLE

APA

Lotfi, E., De Bruyn, M., Buhmann, J., & Daelemans, W. (2022). What Was Your Name Again? Interrogating Generative Conversational Models For Factual Consistency Evaluation. In GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop (pp. 509–519). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.gem-1.47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free