Validity of evidence-based recommendations by a large language model for interdisciplinary board decisions in neurooncology: An explorative study and critical evaluation

0Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Objectives: This study aims to evaluate the stylistic and structural equivalence of Artificial Intelligence (AI)-generated summaries, particularly those by Large Language Models (LLMs) like ChatGPT, compared to traditional human-generated case summaries in neuro-oncological board decisions. The primary goal is to explore the stylistic alignment between AI-generated and human-authored summaries from board meeting audio recordings. Methods: The study compares 30 traditional human-generated case summaries with 30 AI-generated summaries based on board meeting audio recordings. Two expert raters, blinded to the source of the summaries, evaluated a total of 60 cases. A Likert scale was used to assess the plausibility, linguistic style, evidence adherence, and reference accuracy of the summaries. Results: The results indicated that both LLM-generated and human-reviewed summaries demonstrated consistently high performance across all criteria evaluated. The general plausibility ratings were comparable (LLM: 4.7, Human: 4.73, P = .959). Linguistic style ratings also showed similarity (LLM: 4.87, Human: 4.97, P = .512). In terms of adherence to evidence, the means were close (LLM: 4.8, Human: 4.87, P = .541). Reference accuracy was slightly higher for AI-generated summaries (LLM: 4.97, Human: 4.9, P = .664). These findings were consistent with the results from Rater 2, and statistical analysis using Kendall's tau showed no significant differences between methods (P > .05). Conclusion: The study finds that LLM-generated summaries can effectively emulate the style and structure of human-authored ones, indicating their promise as an additional tool in neuro-oncology. These AI models can enhance documentation quality and serve as valuable support in clinical settings. While further research is necessary to explore broader applications, LLMs offer exciting potential as a complement to traditional decision-making processes.

Cite

CITATION STYLE

APA

Goldberg, M., Eisenkolb, V. M., Aftahy, A. K., Negwer, C., Meyer, H. S., Gempt, J., … Wagner, A. (2025). Validity of evidence-based recommendations by a large language model for interdisciplinary board decisions in neurooncology: An explorative study and critical evaluation. Digital Health, 11. https://doi.org/10.1177/20552076251384604

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free