BoB, a best-of-breed automated text de-identification system for VHA clinical documents

Oscar Ferrández; Brett R. South; Shuying Shen; F. Jeffrey Friedlin; Matthew H. Samore; Stéphane M. Meystre

Journal ArticleOPEN ACCESS

BoB, a best-of-breed automated text de-identification system for VHA clinical documents

Journal of the American Medical Informatics Association (2013) 20(1) 77-83

DOI: 10.1136/amiajnl-2012-001020

57Citations

112Readers

Abstract

Objective: De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text deidentification system for Veterans Health Administration (VHA) clinical documents. Materials and methods: We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best deidentification methods for VHA documents. This best-ofbreed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. Results: We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. Discussion: BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Conclusions: Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Ferrández, O., South, B. R., Shen, S., Jeffrey Friedlin, F., Samore, M. H., & Meystre, S. M. (2013). BoB, a best-of-breed automated text de-identification system for VHA clinical documents. Journal of the American Medical Informatics Association, 20(1), 77–83. https://doi.org/10.1136/amiajnl-2012-001020

Readers' Seniority

PhD / Post grad / Masters / Doc 38

54%

Researcher 24

34%

Professor / Associate Prof. 6

Lecturer / Post doc 2

Readers' Discipline

Computer Science 23

43%

Medicine and Dentistry 21

40%

Psychology 6

11%

Neuroscience 3

BoB, a best-of-breed automated text de-identification system for VHA clinical documents

Abstract

References Powered by Scopus

LIBSVM: A Library for support vector machines

Incorporating non-local information into information extraction systems by Gibbs sampling

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications

Cited by Powered by Scopus

Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1

Biomedical data privacy: Problems, perspectives, and recent advances

Automatic detection of protected health information from clinic narratives

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline