Abstract
Medical discharge summaries are vital documents in healthcare, often containing Personally Identifiable Information (PII), raising concerns regarding privacy and regulatory compliance. This article proposes a cutting-edge approach that utilizes intelligent data de-identification to address this challenge. This article employs Natural Language Processing (NLP) techniques such as Named Entity Recognition (NER), a hybrid approach that integrates Machine Learning (ML) models, Regular Expressions (REGEX)-based recognizers, and extensive lists of names and addresses. The proposed method focuses on achieving a delicate balance between extracting valuable insights from data and safeguarding sensitive information. The evaluation against benchmarks demonstrates significant improvements in de-identification performance, particularly in discharge summaries. We present findings from our system’s evaluation of synthesized discharge summaries, the OntoNotes dataset, and the CoNLL-2003 dataset, demonstrating its effectiveness in anonymizing diverse medical text sources.
Author supplied keywords
Cite
CITATION STYLE
Mortadi, A., Nazih, W., Eldesouki, M. I., & Hifny, Y. (2025). Intelligent De-Identification of Medical Discharge Summaries Using Hybrid NLP Techniques. ACM Transactions on Asian and Low-Resource Language Information Processing, 24(5). https://doi.org/10.1145/3724118
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.