Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition

1Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a English-Korean parallel dataset that collects 381K news articles where 1,400 of them, comprising 10K sentences, are manually labeled for crosslingual named entity recognition (NER). The annotation guidelines for the two languages are developed in parallel, that yield the inter-annotator agreement scores of 91 and 88% for English and Korean respectively, indicating sublime quality annotation in our dataset. Three types of crosslingual learning approaches, direct model transfer, embedding projection, and annotation projection, are used to develop zero-shot Korean NER models. Our best model gives the F1-score of 51% that is very encouraging, considering the extremely distinct natures of these two languages. This is pioneering work that explores zero-shot crosslingual learning between English and Korean and provides rich parallel annotation for a core NLP task such as named entity recognition.

Cite

CITATION STYLE

APA

Kim, J., Choi, N., Lim, S. S., Kim, J., Chung, S., Woo, H., … Choi, J. D. (2021). Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition. In MRL 2021 - 1st Workshop on Multilingual Representation Learning, Proceedings of the Conference (pp. 224–237). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.mrl-1.19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free