Rapidly Retargetable Approaches to De-identification in Medical Records

103Citations
Citations of this article
97Readers
Mendeley users who have this article in their library.

Abstract

Objective: This paper describes a successful approach to de-identification that was developed to participate in a recent AMIA-sponsored challenge evaluation. Method: Our approach focused on rapid adaptation of existing toolkits for named entity recognition using two existing toolkits, Carafe and LingPipe. Results: The "out of the box" Carafe system achieved a very good score (phrase F-measure of 0.9664) with only four hours of work to adapt it to the de-identification task. With further tuning, we were able to reduce the token-level error term by over 36% through task-specific feature engineering and the introduction of a lexicon, achieving a phrase F-measure of 0.9736. Conclusions: We were able to achieve good performance on the de-identification task by the rapid retargeting of existing toolkits. For the Carafe system, we developed a method for tuning the balance of recall vs. precision, as well as a confidence score that correlated well with the measured F-score. © 2007 J Am Med Inform Assoc.

Cite

CITATION STYLE

APA

Wellner, B., Huyck, M., Mardis, S., Aberdeen, J., Morgan, A., Peshkin, L., … Hirschman, L. (2007). Rapidly Retargetable Approaches to De-identification in Medical Records. Journal of the American Medical Informatics Association, 14(5), 564–573. https://doi.org/10.1197/jamia.M2435

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free