Learning an unreferenced metric for online dialogue evaluation

42Citations
Citations of this article
140Readers
Mendeley users who have this article in their library.

Abstract

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue. There have been recent efforts to develop automatic dialogue evaluation metrics, but most of them do not generalize to unseen datasets and/or need a human-generated reference response during inference, making it infeasible for online evaluation. Here, we propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances, and leverages the temporal transitions that exist between them. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.

Cite

CITATION STYLE

APA

Sinha, K., Parthasarathi, P., Wang, J., Lowe, R., Hamilton, W. L., & Pineau, J. (2020). Learning an unreferenced metric for online dialogue evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2430–2441). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.220

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free