USR: An unsupervised and reference free evaluation metric for dialog generation

Shikib Mehri; Maxine Eskenazi

Conference ProceedingsOPEN ACCESS

USR: An unsupervised and reference free evaluation metric for dialog generation

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2020) 681-707

DOI: 10.18653/v1/2020.acl-main.64

139Citations

210Readers

Abstract

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

Cite

CITATION STYLE

APA

Mehri, S., & Eskenazi, M. (2020). USR: An unsupervised and reference free evaluation metric for dialog generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 681–707). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.64

USR: An unsupervised and reference free evaluation metric for dialog generation

Abstract

Cite

Register to see more suggestions