Towards unified dialogue system evaluation: A comprehensive analysis of current evaluation protocols

Sarah E. Finch; Jinho D. Choi

Conference Proceedings

Towards unified dialogue system evaluation: A comprehensive analysis of current evaluation protocols

SIGDIAL 2020 - 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (2020) 236-245

DOI: 10.18653/v1/2020.sigdial-1.29

50Citations

137Readers

Get full text

Abstract

As conversational AI-based dialogue management has increasingly become a trending topic, the need for a standardized and reliable evaluation procedure grows even more pressing. The current state of affairs suggests various evaluation protocols to assess chat-oriented dialogue management systems, rendering it difficult to conduct fair comparative studies across different approaches and gain an insightful understanding of their values. To foster this research, a more robust evaluation protocol must be set in place. This paper presents a comprehensive synthesis of both automated and human evaluation methods on dialogue systems, identifying their shortcomings while accumulating evidence towards the most effective evaluation dimensions. A total of 20 papers from the last two years are surveyed to analyze three types of evaluation protocols: automated, static, and interactive. Finally, the evaluation dimensions used in these papers are compared against our expert evaluation on the system-user dialogue data collected from the Alexa Prize 2020.

Cite

CITATION STYLE

APA

Finch, S. E., & Choi, J. D. (2020). Towards unified dialogue system evaluation: A comprehensive analysis of current evaluation protocols. In SIGDIAL 2020 - 21st Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference (pp. 236–245). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.sigdial-1.29

Towards unified dialogue system evaluation: A comprehensive analysis of current evaluation protocols

Abstract

Cite

Register to see more suggestions