RuCCoN: Clinical Concept Normalization in Russian

5Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

We present RuCCoN, a new dataset for clinical concept normalization in Russian manually annotated by medical professionals. It contains over 16,028 entity mentions manually linked to over 2,409 unique concepts from the Russian language part of the UMLS ontology. We provide train/test splits for different settings (stratified, zero-shot, and CUI-less) and present strong baselines obtained with state-of-the-art models such as SapBERT. At present, Russian medical NLP is lacking in both datasets and trained models, and we view this work as an important step towards filling this gap. Our dataset and annotation guidelines are available at https://github.com/sberbank-ai-lab/RuCCoN.

Cite

CITATION STYLE

APA

Nesterov, A., Zubkova, G., Miftahutdinov, Z., Kokh, V., Tutubalina, E., Shelmanov, A., … Nikolenko, S. (2022). RuCCoN: Clinical Concept Normalization in Russian. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 239–245). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free