Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts

Taesoon Hwang; Nishant Aggarwal; Pir Zarak Khan; Thomas Roberts; Amir Mahmood; Madlen M. Griffiths; Nick Parsons; Saboor Khan

Journal ArticleOPEN ACCESS

Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts

PLoS ONE (2024) 19(2 February)

DOI: 10.1371/journal.pone.0297701

8Citations

52Readers

Get full text

Abstract

Introduction ChatGPT, a sophisticated large language model (LLM), has garnered widespread attention for its ability to mimic human-like communication. As recent studies indicate a potential supportive role of ChatGPT in academic writing, we assessed the LLM’s capacity to generate accurate and comprehensive scientific abstracts from published Randomised Controlled Trial (RCT) data, focusing on the adherence to the Consolidated Standards of Reporting Trials for Abstracts (CONSORT-A) statement, in comparison to the original authors’ abstracts. Methodology RCTs, identified in a PubMed/MEDLINE search post-September 2021 across various medical disciplines, were subjected to abstract generation via ChatGPT versions 3.5 and 4, following the guidelines of the respective journals. The overall quality score (OQS) of each abstract was determined by the total number of adequately reported components from the 18-item CONSORT-A checklist. Additional outcome measures included percent adherence to each CONOSORT-A item, readability, hallucination rate, and regression analysis of reporting quality determinants. Results Original abstracts achieved a mean OQS of 11.89 (95% CI: 11.23–12.54), outperforming GPT 3.5 (7.89; 95% CI: 7.32–8.46) and GPT 4 (5.18; 95% CI: 4.64–5.71). Compared to GPT 3.5 and 4 outputs, original abstracts were more adherent with 10 and 14 CONSORT-A items, respectively. In blind assessments, GPT 3.5-generated abstracts were deemed most readable in 62.22% of cases which was significantly greater than the original (31.11%; P = 0.003) and GPT 4-generated (6.67%; P<0.001) abstracts. Moreover, ChatGPT 3.5 exhibited a hallucination rate of 0.03 items per abstract compared to 1.13 by GPT 4. No determinants for improved reporting quality were identified for GPT-generated abstracts. Conclusions While ChatGPT could generate more readable abstracts, their overall quality was inferior to the original abstracts. Yet, its proficiency to concisely relay key information with minimal error holds promise for medical research and warrants further investigations to fully ascertain the LLM’s applicability in this domain.

Cite

CITATION STYLE

APA

Hwang, T., Aggarwal, N., Khan, P. Z., Roberts, T., Mahmood, A., Griffiths, M. M., … Khan, S. (2024). Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts. PLoS ONE, 19(2 February). https://doi.org/10.1371/journal.pone.0297701

Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts

Abstract

Cite

Register to see more suggestions