Biomedical entity linking is an essential building block for various clinical applications and downstream NLP tasks. However, only few annotated biomedical datasets with grounded entity mentions for non-English languages are available for training supervised machine learning models. Moreover, the majority of concept aliases in medical vocabularies are also only available in English. In this work, we consider the problem of linking disease mentions in Spanish clinical case reports to concept identifiers in SNOMED CT, a comprehensive medical terminology system. For these concepts, only a limited number of aliases in the source language are given, but many more can be obtained from other languages and medical vocabularies. We propose a system that utilizes these multilingual aliases to retrieve candidate concepts for a given entity mention and re-ranks retrieved candidates using a trainable cross-encoder. We evaluate our system on the DisTEMIST shared task dataset of the 10th BioASQ challenge. Our results show that supervised re-ranking outperforms the previously best-performing rule-based system, while requiring much less task-specific hyperparameter tuning. Detailed ablation experiments demonstrate that multilingual aliases are highly beneficial to improve recall during candidate generation, but hardly affect re-ranking performance.
CITATION STYLE
Borchert, F., Llorca, I., & Schapranow, M. P. (2023). Cross-Lingual Candidate Retrieval and Re-ranking for Biomedical Entity Linking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14163 LNCS, pp. 135–147). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-42448-9_12
Mendeley helps you to discover research relevant for your work.