A comparison of structured data query methods versus natural language processing to identify metastatic melanoma cases from electronic health records

  • He J
  • Mark L
  • Hilton C
  • et al.
N/ACitations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Background: With the wide-spread adoption of electronic health records (EHR) and advances in the nature language processing (NLP) technology, analyzing unstructured data (e.g. narrative medical reports) becomes feasible and affordable. It is an alternative way other than structured data query to identify clinical events of interest in large populations. Objectives: To evaluate the performance of unstructured data analysis using NLP relative to structured data query in identifying metastatic melanoma patients in a large EHR database. Methods: A retrospective study was conducted using the Indiana Network for Patient Care (INPC) database. The target population included all patients of age 21 years or older who had any clinical records between January 1, 2005, and December 31, 2013. Metastatic melanoma cases were identified by two methods separately: 1) NLP algorithms applied to text reports, and 2) structured data query of diagnosis codes. Manual chart reviews established the “gold standards” for estimating positive predictive values (PPVs). Each identified case was classified as “definite positive,” “definite negative,” “unsure but possible”, or “unsure, but unlikely.” The Indiana Tumor Registry served as an external source of true metastatic melanoma cases for estimating the sensitivities. Results: NLP of text report and structured data query identified 1,727 and 607 metastatic melanoma cases, respectively. A total of 512 cases were identified by both methods. Using “definite positive” from medical chart review as the gold standard, the PPVs of these two methods were 74% vs. 83%. When “unsure but possible” was added to the gold standard, the PPVs slightly increased to 80% vs. 84%. The NLP method had much higher sensitivity than the structured data query method, which was 67% vs. 35%. Conclusions: The NLP method identified metastatic cancer cases nearly three folds as many as structured data query, although the chance of false positive result was slightly higher. It is a useful tool to use alone or together with structured data query in EHR database research.

Cite

CITATION STYLE

APA

He, J., Mark, L., Hilton, C., Martin, J., Baker, J., Duke, J., … Dexter, P. (2019). A comparison of structured data query methods versus natural language processing to identify metastatic melanoma cases from electronic health records. International Journal of Computational Medicine and Healthcare, 1(1), 101. https://doi.org/10.1504/ijcmh.2019.104364

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free