Enhancing Case Capture, Quality, and Completeness of Primary Melanoma Pathology Records via Natural Language Processing

  • Malke J
  • Jin S
  • Camp S
  • et al.
8Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

PURPOSE: Medical records contain a wealth of useful, informative data points valuable for clinical research. Most data points are stored in semistructured or unstructured legacy documents and require manual data abstraction into a structured format to render the information more readily accessible, searchable, and generally analysis ready. The substantial labor needed for this can be cost prohibitive, particularly when dealing with large patient cohorts. METHODS: To establish a high-throughput approach to data abstraction, we developed a novel framework using natural language processing (NLP) and a decision-rules algorithm to extract, transform, and load (ETL) melanoma primary pathology features from pathology reports in an institutional legacy electronic medical record system into a structured database. We compared a subset of these data with a manually curated data set comprising the same patients and developed a novel scoring system to assess confidence in records generated by the algorithm, thus obviating manual review of high-confidence records while flagging specific, low-confidence records for review. RESULTS: The algorithm generated 368,624 individual melanoma data points comprising 16 primary tumor prognostic factors and metadata from 23,039 patients. From these data points, a subset of 147,872 was compared with an existing, manually abstracted data set, demonstrating an exact or synonymous match between 90.4% of all data points. Additionally, the confidence-scoring algorithm demonstrated an error rate of only 3.7%. CONCLUSION: Our NLP platform can identify and abstract melanoma primary prognostic factors with accuracy comparable to that of manual abstraction (< 5% error rate), with vastly greater efficiency. Principles used in the development of this algorithm could be expanded to include other melanoma-specific data points as well as disease-agnostic fields and further enhance capture of essential elements from nonstructured data.

Cite

CITATION STYLE

APA

Malke, J. C., Jin, S., Camp, S. P., Lari, B., Kell, T., Simon, J. M., … Haydu, L. E. (2019). Enhancing Case Capture, Quality, and Completeness of Primary Melanoma Pathology Records via Natural Language Processing. JCO Clinical Cancer Informatics, (3), 1–11. https://doi.org/10.1200/cci.19.00006

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free