Embedding strategies for specialized domains: Application to clinical entity recognition

Hicham El Boukkouri; Olivier Ferret; Thomas Lavergne; Pierre Zweigenbaum

Conference ProceedingsOPEN ACCESS

Embedding strategies for specialized domains: Application to clinical entity recognition

ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (2019) 295-301

DOI: 10.18653/v1/p19-2041

13Citations

98Readers

Abstract

Using pre-trained word embeddings in conjunction with Deep Learning models has become the de facto approach in Natural Language Processing (NLP). While this usually yields satisfactory results, off-the-shelf word embeddings tend to perform poorly on texts from specialized domains such as clinical reports. Moreover, training specialized word representations from scratch is often either impossible or ineffective due to the lack of large enough in-domain data. In this work, we focus on the clinical domain for which we study embedding strategies that rely on general-domain resources only. We show that by combining off-the-shelf contextual embeddings (ELMo) with static word2vec embeddings trained on a small in-domain corpus built from the task data, we manage to reach and sometimes outperform representations learned from a large corpus in the medical domain.

Cite

CITATION STYLE

APA

El Boukkouri, H., Ferret, O., Lavergne, T., & Zweigenbaum, P. (2019). Embedding strategies for specialized domains: Application to clinical entity recognition. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Student Research Workshop (pp. 295–301). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-2041

Embedding strategies for specialized domains: Application to clinical entity recognition

Abstract

Cite

Register to see more suggestions