Transfer-based Enrichment of a Hungarian Named Entity Dataset

Attila Novák; Borbála Novák

Conference ProceedingsOPEN ACCESS

Transfer-based Enrichment of a Hungarian Named Entity Dataset

International Conference Recent Advances in Natural Language Processing, RANLP (2021) 1060-1067

DOI: 10.26615/978-954-452-072-4_119

1Citations

39Readers

Get full text

Abstract

In this paper, we present a major update to the first Hungarian named entity dataset, the Szeged NER corpus. We used zero-shot cross-lingual transfer to initialize the enrichment of entity types annotated in the corpus using three neural NER models: two of them based on the English OntoNotes corpus and one based on the Czech Named Entity Corpus fine-tuned from multilingual neural language models. The output of the models was automatically merged with the original NER annotation, and automatically and manually corrected and further enriched with additional annotation, like qualifiers for various entity types. We present the evaluation of the zero-shot performance of the two OntoNotes-based models and a transformer-based new NER model trained on the training part of the final corpus. We release the corpus and the trained model.

Cite

CITATION STYLE

APA

Novák, A., & Novák, B. (2021). Transfer-based Enrichment of a Hungarian Named Entity Dataset. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 1060–1067). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_119

Transfer-based Enrichment of a Hungarian Named Entity Dataset

Abstract

Cite

Register to see more suggestions