This paper evaluates the performance of three different chatbots: IRIS, TickTock and Joker, that have been made available to the public online. All three retrieval-based dialogue systems are chat-oriented and designed to engage the users into all types of conversations for as long as possible. They employ different approaches to provide relevant and valid responses, and constantly utilize conversational strategies to further automatically improve its own system through machine learning. The analysis of annotations of more than 2000 responses for the three chatbots allowed us to confirm the robustness, scalability and usability of the systems, as well as to detect a few areas in which response accuracy was lacking, and propose future work to further improve the three systems and annotations scheme.
CITATION STYLE
Kong-Vega, N., Shen, M., Wang, M., & D’Haro, L. F. (2019). Subjective annotation and evaluation of three different chatbots WOCHAT: Shared task report. In Lecture Notes in Electrical Engineering (Vol. 579, pp. 371–378). Springer. https://doi.org/10.1007/978-981-13-9443-0_32
Mendeley helps you to discover research relevant for your work.