Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data

17Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The use of pretrained language models, fine-tuned to perform a specific downstream task, has become widespread in NLP. Using a generic language model in specialized domains may, however, be sub-optimal due to differences in language use and vocabulary. In this paper, it is investigated whether an existing, generic language model for Swedish can be improved for the clinical domain through continued pretraining with clinical text. The generic and domain-specific language models are fine-tuned and evaluated on three representative clinical NLP tasks: (i) identifying protected health information, (ii) assigning ICD-10 diagnosis codes to discharge summaries, and (iii) sentence-level uncertainty prediction. The results show that continued pretraining on in-domain data leads to improved performance on all three downstream tasks, indicating that there is a potential added value of domain-specific language models for clinical NLP.

Cite

CITATION STYLE

APA

Lamproudis, A., Henriksson, A., & Dalianis, H. (2021). Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 790–797). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_090

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free