De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier

Joaquim Santos; Henrique D.P. dos Santos; Fábio Tabalipa; Renata Vieira

Conference Proceedings

De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 13074 LNAI 33-41

DOI: 10.1007/978-3-030-91699-2_3

3Citations

2Readers

Get full text

Abstract

The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94.

Author supplied keywords

Cite

CITATION STYLE

APA

Santos, J., dos Santos, H. D. P., Tabalipa, F., & Vieira, R. (2021). De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13074 LNAI, pp. 33–41). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-91699-2_3

De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier

Abstract

Author supplied keywords

Cite

Register to see more suggestions