Comparative Analysis of Chatbot Systems

0Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Existing research on chatbot evaluation suffers from inconsistent assessment standards, fragmented criteria, and insufficient coverage of critical dimensions like legal compliance and ethical alignment, which hinders reliable benchmarking of chatbots' performance. Our study proposes a comprehensive framework for such evaluation and systematically compares five chatbot systems: Tidio (Rule-Based), GPT-4o (AI-Powered), Claude 3.5 Sonnet (LLM), Watson Assistant (Enterprise), and Qwen2.5-Max (Multilingual) in terms of their accuracy, safety, legal compliance, generalizability of performance, and ethical alignment. We conclude that while chatbots enhance efficiency in healthcare (97.34% patient education completeness) and e-commerce (30%-40% cost reduction), critical limitations persist. Recommendations include: (1) retrieval-augmented generation (RAG) for hallucination reduction, (2) ethical governance frameworks (e.g., AILuminate), and (3) domain-specialized tuning. Cross-sector collaboration and standardized evaluations are essential for responsible deployment of AI.

Cite

CITATION STYLE

APA

Xu, H., Wan, L., Li, Y., Liu, J., & Lau, A. S. M. (2025). Comparative Analysis of Chatbot Systems. In Frontiers in Artificial Intelligence and Applications (Vol. 412, pp. 392–398). IOS Press BV. https://doi.org/10.3233/FAIA250737

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free