Intelligent De-Identification of Medical Discharge Summaries Using Hybrid NLP Techniques

0Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Medical discharge summaries are vital documents in healthcare, often containing Personally Identifiable Information (PII), raising concerns regarding privacy and regulatory compliance. This article proposes a cutting-edge approach that utilizes intelligent data de-identification to address this challenge. This article employs Natural Language Processing (NLP) techniques such as Named Entity Recognition (NER), a hybrid approach that integrates Machine Learning (ML) models, Regular Expressions (REGEX)-based recognizers, and extensive lists of names and addresses. The proposed method focuses on achieving a delicate balance between extracting valuable insights from data and safeguarding sensitive information. The evaluation against benchmarks demonstrates significant improvements in de-identification performance, particularly in discharge summaries. We present findings from our system’s evaluation of synthesized discharge summaries, the OntoNotes dataset, and the CoNLL-2003 dataset, demonstrating its effectiveness in anonymizing diverse medical text sources.

Cite

CITATION STYLE

APA

Mortadi, A., Nazih, W., Eldesouki, M. I., & Hifny, Y. (2025). Intelligent De-Identification of Medical Discharge Summaries Using Hybrid NLP Techniques. ACM Transactions on Asian and Low-Resource Language Information Processing, 24(5). https://doi.org/10.1145/3724118

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free