Feasibility and accuracy of extracting cancer stage information from narrative electronic health record data

49Citations
Citations of this article
87Readers
Mendeley users who have this article in their library.

Abstract

Purpose Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs). Such documentation results in tedious and time-consuming abstraction efforts by tumor registrars and other secondary users. This information may be amenable to extraction by automated methods. Methods We developed a natural language processing algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR. These methods were developed in a training set of patients with lung cancer, independently validated in a test set of patients with lung cancer, and compared with the gold standard of Vanderbilt Cancer Registry-determined stage (when available). Results In the combined data set of 2,323 patients (training set, n = 1,103; validation set, n = 1,220), 751,880 documents were analyzed. A stage statement was extracted from 2,239 (98.6%) patientEHRs(median,24documents per patient). Stage discordancewascommon, affecting 83.6% of these EHRs. Nevertheless, algorithmically derived stage accuracy was high in the validation set (k = 0.906; 95% CI, 0.873 to 0.939), when including notes generated within 14 weeks from diagnosis. Conclusion Accurate stage determination can be achieved through automated methods applied to narrative text, despite the frequent presence of discordance in such data. Our results also indicate that stage can be automatically captured in a shorter timeframe than the 6-month window used by cancer registries, as early as 5 weeks from diagnosis. These methods may be generalizable to large narrative cancer data sets.

Cite

CITATION STYLE

APA

Warner, J. L., Levy, M. A., & Neuss, M. N. (2016). Feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. Journal of Oncology Practice, 12(2), e169–e179. https://doi.org/10.1200/JOP.2015.004622

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free