Leveraging machine-learning techniques to detect recurrences in cancer registry data: A multi-registry validation study using German lung cancer data

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Cancer recurrence and progression, once seen as markers of poor prognosis, are now considered manageable aspects of long-term care. Advances in treatment have extended survival, emphasizing the need for representative epidemiological information. Population-based cancer registries are essential in this respect. However, tracking treatment outcomes and accurately distinguishing recurrences from progressions remain challenging due to incomplete follow-up data. To address this aiming at meaningful cancer registry data analyses, we employed machine learning (ML) for precise classification, surpassing traditional clinical assumptions. Methods: We developed a ML model to identify and classify cancer recurrence and progression using lung cancer (ICD-10: C34) data from the Hamburg Cancer Registry. To ensure interoperability, we created a standardized indicator dataset. The model's predictive performance was validated using data from five additional German cancer registries. After extensive evaluation, a histogram-based gradient-boosted decision tree ensemble was chosen for its high accuracy and adaptability. Results: The model demonstrated strong predictive performance, with areas under the curve (AUC) ranging from 0.74 to 0.99 across test datasets, highlighting its robustness and generalizability. Its classification accuracy was comparable to experienced human annotators, ensuring reliability for large-scale analysis. Conclusion: This study highlights the potential of ML in enhancing cancer registry data interpretation. By reliably identifying recurrences and progressions, our algorithm addresses gaps caused by incomplete reporting. The established framework provides a scalable approach for integrating AI-driven insights into cancer research, improving registry-based outcome analyses, and supporting advancements in cancer epidemiology.

Cite

CITATION STYLE

APA

Kusche, H., Gundler, C., Johanns, O., Sauerberg, M., Wicker, T., Heinrichs, V., … Nennecke, A. (2025). Leveraging machine-learning techniques to detect recurrences in cancer registry data: A multi-registry validation study using German lung cancer data. European Journal of Cancer, 227. https://doi.org/10.1016/j.ejca.2025.115604

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free