De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94.

Cite

CITATION STYLE

APA

Santos, J., dos Santos, H. D. P., Tabalipa, F., & Vieira, R. (2021). De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13074 LNAI, pp. 33–41). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-91699-2_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free