Transfer-based Enrichment of a Hungarian Named Entity Dataset

1Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we present a major update to the first Hungarian named entity dataset, the Szeged NER corpus. We used zero-shot cross-lingual transfer to initialize the enrichment of entity types annotated in the corpus using three neural NER models: two of them based on the English OntoNotes corpus and one based on the Czech Named Entity Corpus fine-tuned from multilingual neural language models. The output of the models was automatically merged with the original NER annotation, and automatically and manually corrected and further enriched with additional annotation, like qualifiers for various entity types. We present the evaluation of the zero-shot performance of the two OntoNotes-based models and a transformer-based new NER model trained on the training part of the final corpus. We release the corpus and the trained model.

Cite

CITATION STYLE

APA

Novák, A., & Novák, B. (2021). Transfer-based Enrichment of a Hungarian Named Entity Dataset. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 1060–1067). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free