Abstract
Manual data entry in cancer registries is both time-consuming and prone to error. Although large language models (LLMs) offer promising solutions, prior studies have frequently relied on preprocessed datasets or required complex fine-tuning, limiting their applicability in clinical settings. Here, we assessed the performance of out-of-the-box LLMs on TNM classification tasks using only prompt engineering, without data anonymization or model fine-tuning. We identified manual registry error rates of 5.5–17.0% in a real-world gynecologic cancer registry. Both a cloud-based LLM (Gemini 1.5; T- and N-stage accuracy: 0.994 and 0.993, respectively) and the top-performing local model (Qwen2.5 72B; T- and N-stage accuracy: 0.971 and 0.923, respectively) outperformed existing manual entries in extracting pathological T and N classifications. These models also achieved accuracies of 0.909 and 0.895 in clinical M classification, respectively. Our approach reflects real-world clinical workflows and offers a practical solution for enhancing data integrity in clinical registries using LLMs.
Cite
CITATION STYLE
Ishida, K., Murakami, R., Yamanoi, K., Hamada, K., Hasebe, K., Sakurai, A., … Mandai, M. (2025). Real-world application of large language models for automated TNM staging using unstructured gynecologic oncology reports. Npj Precision Oncology, 9(1). https://doi.org/10.1038/s41698-025-01157-4
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.