Abstract
The notorious one-to-many nature of open-domain dialogues poses huge challenges for automatic evaluation methods. Recent studies attempt to mitigate this issue by considering the similarity of the generated response with the conversational context and design discriminative models to learn from multiple positive responses. Despite the promising results, they can not be applied to general scenarios where training data with multiple responses is unavailable. To this end, in this paper, we propose a self-supervised setting to obtain a smooth latent space that can both capture discourse-level context information and implicitly model more references in latent space. Specifically, we present EMS, an Enhanced dialogue evaluation Metric in latent Space. Experimental results on two real-world dialogue datasets confirm the superiority of our method for open-domain dialogue evaluation, where both Pearson and Spearman correlations with human judgments outperform all baselines.
Cite
CITATION STYLE
Chan, Z., Liu, L., Li, J., Zhang, H., Zhao, D., Shi, S., & Yan, R. (2021). Enhancing the Open-Domain Dialogue Evaluation in Latent Space. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4889–4900). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.432
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.