DynaEval: Unifying turn and dialogue level evaluation

Chen Zhang; Yiming Chen; Luis Fernando D'Haro; Yan Zhang; Thomas Friedrichs; Grandee Lee; Haizhou Li

Conference Proceedings

DynaEval: Unifying turn and dialogue level evaluation

ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2021) 1 5676-5689

DOI: 10.18653/v1/2021.acl-long.441

49Citations

99Readers

Get full text

Abstract

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

Cite

CITATION STYLE

APA

Zhang, C., Chen, Y., D’Haro, L. F., Zhang, Y., Friedrichs, T., Lee, G., & Li, H. (2021). DynaEval: Unifying turn and dialogue level evaluation. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 5676–5689). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.441

DynaEval: Unifying turn and dialogue level evaluation

Abstract

Cite

Register to see more suggestions