Subjective annotation and evaluation of three different chatbots WOCHAT: Shared task report

Naomi Kong-Vega; Mingxin Shen; Mo Wang; Luis Fernando D’Haro

Conference Proceedings

Subjective annotation and evaluation of three different chatbots WOCHAT: Shared task report

Lecture Notes in Electrical Engineering (2019) 579 371-378

DOI: 10.1007/978-981-13-9443-0_32

4Citations

2Readers

Get full text

Abstract

This paper evaluates the performance of three different chatbots: IRIS, TickTock and Joker, that have been made available to the public online. All three retrieval-based dialogue systems are chat-oriented and designed to engage the users into all types of conversations for as long as possible. They employ different approaches to provide relevant and valid responses, and constantly utilize conversational strategies to further automatically improve its own system through machine learning. The analysis of annotations of more than 2000 responses for the three chatbots allowed us to confirm the robustness, scalability and usability of the systems, as well as to detect a few areas in which response accuracy was lacking, and propose future work to further improve the three systems and annotations scheme.

Cite

CITATION STYLE

APA

Kong-Vega, N., Shen, M., Wang, M., & D’Haro, L. F. (2019). Subjective annotation and evaluation of three different chatbots WOCHAT: Shared task report. In Lecture Notes in Electrical Engineering (Vol. 579, pp. 371–378). Springer. https://doi.org/10.1007/978-981-13-9443-0_32

Subjective annotation and evaluation of three different chatbots WOCHAT: Shared task report

Abstract

Cite

Register to see more suggestions