Shared Digital Resource Application within Insular Scandinavian

  • Hafsteinsson H
  • Ingason A
N/ACitations
Citations of this article
40Readers
Mendeley users who have this article in their library.

Abstract

We describe the application of language technology methods and resources devised for Icelandic, a North Germanic language with about 300,000 speakers, in digital language resource creation for Faroese, a North Germanic language with about 50,000 speakers. The current project encompassed the development of a dedicated, high-accuracy part-of-speech (PoS) tagging solution for Faroese. To achieve this, a state-of-the-art neural PoS tagger for Icelandic, ABLTagger, was trained on a 100,000 word PoS-tagged corpus for Faroese, standardised with methods previously applied to Icelandic corpora. This tagger was supplemented with a novel Experimental Database of Faroese Inflection (EDFM), which is a lexicon containing morphological information on 67,488 Faroese words with about one million inflectional forms. This approach produced a PoS-tagging model for Faroese which achieves a 91.40% overall accuracy when evaluated with 10-fold cross validation, which is currently the highest accuracy for a dedicated Faroese PoS-tagger. The products of this project are made available for use in further research in Faroese language technology.

Cite

CITATION STYLE

APA

Hafsteinsson, H., & Ingason, A. K. (2021). Shared Digital Resource Application within Insular Scandinavian. Proceedings of the Workshop on Computational Methods for Endangered Languages, 1(2). https://doi.org/10.33011/computel.v1i.961

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free