Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction

4Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Large language models (LLMs) can generate human-like, empathetic responses within seconds. Their potential in terms of comprehensibility, empathy, and completeness to support physician–patient communication in bariatric surgery care needs to be evaluated. Methods: We collected 200 real-world questions from patient support groups, initial consultations, and follow-up visits, which were answered by GPT-4o and two human bariatric experts. An independent bariatric expert then blindly evaluated the responses for their overall quality, accuracy, and comprehensiveness. If needed, the responses were corrected, and the correction time was documented. Afterwards, bariatric patients (n = 189) across Germany rated the responses, assessing each one on its clarity, empathy, and completeness. Results: The LLM required significantly less time (2.7 vs. 87.2 s, p < 0.0001) and generated longer responses (607 vs. 262 characters, p = 0.001) than human experts. LLM-generated responses were rated significantly higher by patients in terms of clarity (4.8 vs. 4.6), completeness (4.5 vs. 3.4), and empathy (4.1 vs. 3.2, all p < 0.0001). In total, 64.9% of patients preferred LLM-generated responses, while 18.5% preferred physician responses. Notably, patients with a lower degree of education showed a stronger preference for LLM responses over physician responses. Conclusion: LLMs could possibly act as an assistant for physicians and help improve their response efficiency while maintaining accuracy under physicians’ oversight. This approach could optimize physician time management and enhance patient satisfaction in bariatric care communication.

Cite

CITATION STYLE

APA

Vedder, K., Blank, S., Wilhelm, T., Fidan, D., Pachkiv, I., Cao, H., … Yang, C. (2025). Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction. Obesity Surgery, 35(9), 3801–3809. https://doi.org/10.1007/s11695-025-08115-w

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free