Comparative analysis of large language models as decision support tools in oral pathology

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This study evaluated the performance of four large language model based chatbots (LLMs) (ChatGPT-4.0, ChatGPT o1-preview, Gemini, and Meta AI) as decision-support systems for interpreting histopathologic descriptions of oral lesions, assessing agreement between their s generated a suggested primary interpretation and three differential diagnoses. Outputs were categorized as Different, Similar, or Correct compared to the consensus reference diagnosis established by two board-certified pathologists. Statistical analyses included the Friedman test to compare model performance, Wilcoxon signed-rank tests for pairwise comparisons, Cohen's κ to assess agreement, and regression analyses to evaluate the influence of age and sex. Differential diagnosis performance was also analyzed. ChatGPT o1-preview demonstrated the highest proportion of outputs concordant with the reference diagnosis (68.6%), followed by Meta AI (65.7%), ChatGPT-4.0 (59.8%), and Gemini (27.5%). In terms of agreement with oral pathologists, ChatGPT o1-preview (κ = 0.66) and Meta AI (κ = 0.63) showed substantial agreement, ChatGPT-4.0 demonstrated moderate agreement (κ = 0.57), and Gemini showed poor agreement (κ = 0.24). Increasing patient age was associated with a mild but statistically significant reduction in model performance for ChatGPT-4.0, Meta AI, and Gemini, while no significant age effect was observed for ChatGPT o1-preview; patient sex had no significant impact. Among the evaluated chatbots, ChatGPT o1-preview showed the highest alignment with oral pathologists' reference diagnoses. These findings support the potential role of LLMs as complementary decision-support tools for interpreting oral histopathology descriptions, while highlighting substantial inter-model variability and the need for cautious implementation with continued human oversight.

Cite

CITATION STYLE

APA

Alvarez-Silberberg, V. I., Gil-Manich, V., Cuevas-Nunez, M., Alvarez-Silberberg, V. I., Alvarez-Silberberg, C. P., Ramirez, V., … Cuevas-Nunez, M. (2026). Comparative analysis of large language models as decision support tools in oral pathology. Scientific Reports, 16(1). https://doi.org/10.1038/s41598-026-41533-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free