Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction

Katharina Vedder; Susanne Blank; Tea Wilhelm; Darick Fidan; Ihor Pachkiv; Han Cao; Christel Weiß; Chengpeng Li; Marion Rung-Friebe; Christoph Reissfelder; Mirko Otto; Cui Yang

Journal ArticleOPEN ACCESS

Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction

Obesity Surgery (2025) 35(9) 3801-3809

DOI: 10.1007/s11695-025-08115-w

4Citations

19Readers

Abstract

Background: Large language models (LLMs) can generate human-like, empathetic responses within seconds. Their potential in terms of comprehensibility, empathy, and completeness to support physician–patient communication in bariatric surgery care needs to be evaluated. Methods: We collected 200 real-world questions from patient support groups, initial consultations, and follow-up visits, which were answered by GPT-4o and two human bariatric experts. An independent bariatric expert then blindly evaluated the responses for their overall quality, accuracy, and comprehensiveness. If needed, the responses were corrected, and the correction time was documented. Afterwards, bariatric patients (n = 189) across Germany rated the responses, assessing each one on its clarity, empathy, and completeness. Results: The LLM required significantly less time (2.7 vs. 87.2 s, p < 0.0001) and generated longer responses (607 vs. 262 characters, p = 0.001) than human experts. LLM-generated responses were rated significantly higher by patients in terms of clarity (4.8 vs. 4.6), completeness (4.5 vs. 3.4), and empathy (4.1 vs. 3.2, all p < 0.0001). In total, 64.9% of patients preferred LLM-generated responses, while 18.5% preferred physician responses. Notably, patients with a lower degree of education showed a stronger preference for LLM responses over physician responses. Conclusion: LLMs could possibly act as an assistant for physicians and help improve their response efficiency while maintaining accuracy under physicians’ oversight. This approach could optimize physician time management and enhance patient satisfaction in bariatric care communication.

Author supplied keywords

Cite

CITATION STYLE

APA

Vedder, K., Blank, S., Wilhelm, T., Fidan, D., Pachkiv, I., Cao, H., … Yang, C. (2025). Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction. Obesity Surgery, 35(9), 3801–3809. https://doi.org/10.1007/s11695-025-08115-w

Comparative Analysis of Large Language Model and Physician-Generated Responses in Bariatric Patient Inquiries: Assessing the Accuracy and Patient Satisfaction

Abstract

Author supplied keywords

Cite

Register to see more suggestions