Resilience of clinical text de-identified with “hiding in plain sight” to hostile reidentification attacks by human readers

2Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Objective: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this “residual PII problem.” HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII. Materials and Methods: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers. Results: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision ¼ 37%) for patient ages, 32% (mean precision ¼ 26%) for dates, 25% (mean precision ¼ 37%) for doctor names, 45% (mean precision ¼ 55%) for organization names, and 23% (mean precision ¼ 57%) for patient names. Recall was 32% (precision ¼ 40%) for internal and 22% (precision ¼33%) for external readers. Discussion and Conclusions: Approximately 70% of leaked PII “hiding” in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Cite

CITATION STYLE

APA

Carrell, D. S., Malin, B. A., Cronkite, D. J., Aberdeen, J. S., Clark, C., Li, M., … Hirschman, L. (2020). Resilience of clinical text de-identified with “hiding in plain sight” to hostile reidentification attacks by human readers. Journal of the American Medical Informatics Association, 27(9), 1374–1382. https://doi.org/10.1093/jamia/ocaa095

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free