Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition

Pragya Pradhan

Journal ArticleOPEN ACCESS

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition

Pradhan P

Medicina Oral Patologia Oral y Cirugia Bucal (2025) 30(2) e224-e231

DOI: 10.4317/medoral.26824

19Citations

38Readers

Abstract

Background: The accurate and timely diagnosis of oral potentially malignant lesions (OPMLs) is crucial for effective management and prevention of oral cancer. Recent advancements in artificial intelligence technologies indicates its potential to assist in clinical decision-making. Hence, this study was carried out with the aim to evaluate and compare the diagnostic accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in identifying OPMLs. Material and Methods: The analysis was carried out using 42 case reports from PubMed, Scopus and Google Scholar and images from two datasets, corresponding to different OPMLs. The reports were inputted separately for text description-based diagnosis in GPT 3.5, 4.0, 4o and Gemini, and for image recognition-based diagnosis in GPT 4o and Gemini. Two subject-matter experts independently reviewed the reports and offered their evaluations. Results: For text-based diagnosis, among LLMs, GPT 4o got the maximum number of correct responses (27/42), followed by GPT 4.0 (20/42), GPT 3.5 (18/42) and Gemini (15/42). In identifying OPMLs based on image, GPT 4o demonstrated better performance than Gemini. There was fair to moderate agreement found between Large Language Models (LLMs) and subject experts. None of the LLMs matched the accuracy of the subject experts in identifying the correct number of lesions. Conclusions: The results point towards cautious optimism with respect to commonly used LLMs in diagnosing OPMLs. While their potential in diagnostic applications is undeniable, their integration should be approached judiciously.

Author supplied keywords

Cite

CITATION STYLE

APA

Pradhan, P. (2025). Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition. Medicina Oral Patologia Oral y Cirugia Bucal, 30(2), e224–e231. https://doi.org/10.4317/medoral.26824

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions