Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: We assessed 2 versions of the large language model (LLM) ChatGPT—versions 3.5 and 4.0—in generating appropriate, consistent, and readable recommendations on core critical care topics. Research Question: How do successive large language models compare in terms of generating appropriate, consistent, and readable recommendations on core critical care topics? Design and Methods: A set of 50 LLM-generated responses to clinical questions were evaluated by 2 independent intensivists based on a 5-point Likert scale for appropriateness, consistency, and readability. Results: ChatGPT 4.0 showed significantly higher median appropriateness scores compared to ChatGPT 3.5 (4.0 vs 3.0, P

Cite

CITATION STYLE

APA

Balta, K. Y., Javidan, A. P., Walser, E., Arntfield, R., & Prager, R. (2024). Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations. Journal of Intensive Care Medicine. https://doi.org/10.1177/08850666241267871

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free