Named entity recognition for icelandic: Annotated corpus and models

5Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Named entity recognition (NER) can be a challenging task, especially in highly inflected languages where each entity can have many different surface forms. We have created the first NER corpus for Icelandic by annotating 48,371 named entities (NEs) using eight NE types, in a text corpus of 1 million tokens. Furthermore, we have used the corpus to train three machine learning models: first, a CRF model that makes use of shallow word features and a gazetteer function; second, a perceptron model with shallow word features and externally trained word clusters; and third, a BiLSTM model with external word embeddings. Finally, we applied simple voting to combine the model outputs. The voting method obtains an $$F:{1}$$ score of 85.79, gaining 1.89 points compared to the best performing individual model. The corpus and the models are publicly available.

Cite

CITATION STYLE

APA

Ingólfsdóttir, S. L., Guðjónsson, Á. A., & Loftsson, H. (2020). Named entity recognition for icelandic: Annotated corpus and models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12379 LNAI, pp. 46–57). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59430-5_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free