Clinical Research With Large Language Models Generated Writing - Clinical Research with AI-assisted Writing (CRAW) Study

Ivan A. Huespe; Jorge Echeverri; Aisha Khalid; Indalecio Carboni Bisso; Carlos G. Musso; Salim Surani; Vikas Bansal; Rahul Kashyap

Journal ArticleOPEN ACCESS

Clinical Research With Large Language Models Generated Writing - Clinical Research with AI-assisted Writing (CRAW) Study

Critical Care Explorations (2023) 5(10) E0975

DOI: 10.1097/CCE.0000000000000975

21Citations

43Readers

Get full text

Abstract

IMPORTANCE: The scientific community debates Generative Pre-trained Transformer (GPT)-3.5's article quality, authorship merit, originality, and ethical use in scientific writing. OBJECTIVES: Assess GPT-3.5's ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13. DESIGN: Observational cross-sectional study. SETTING: Researchers from 20 countries from six continents evaluated the backgrounds. PARTICIPANTS: Researchers with a Scopus index greater than 1 were included. MAIN OUTCOMES AND MEASURES: In this study, we generated a background section of a critical care clinical research question on "acute kidney injury in sepsis"using three different methods: researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools. RESULTS: A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1-7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; p < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively. CONCLUSIONS AND RELEVANCE: GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.

Author supplied keywords

Cite

CITATION STYLE

APA

Huespe, I. A., Echeverri, J., Khalid, A., Carboni Bisso, I., Musso, C. G., Surani, S., … Kashyap, R. (2023). Clinical Research With Large Language Models Generated Writing - Clinical Research with AI-assisted Writing (CRAW) Study. Critical Care Explorations, 5(10), E0975. https://doi.org/10.1097/CCE.0000000000000975

Clinical Research With Large Language Models Generated Writing - Clinical Research with AI-assisted Writing (CRAW) Study

Abstract

Author supplied keywords

Cite

Register to see more suggestions