ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments

4Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

Existing automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.

Cite

CITATION STYLE

APA

Yang, R., Li, Z., Tang, H., & Zhu, K. Q. (2022). ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 7579–7590). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.522

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free