We describe the application of language technology methods and resources devised for Icelandic, a North Germanic language with about 300,000 speakers, in digital language resource creation for Faroese, a North Germanic language with about 50,000 speakers. The current project encompassed the development of a dedicated, high-accuracy part-of-speech (PoS) tagging solution for Faroese. To achieve this, a state-of-the-art neural PoS tagger for Icelandic, ABLTagger, was trained on a 100,000 word PoS-tagged corpus for Faroese, standardised with methods previously applied to Icelandic corpora. This tagger was supplemented with a novel Experimental Database of Faroese Inflection (EDFM), which is a lexicon containing morphological information on 67,488 Faroese words with about one million inflectional forms. This approach produced a PoS-tagging model for Faroese which achieves a 91.40% overall accuracy when evaluated with 10-fold cross validation, which is currently the highest accuracy for a dedicated Faroese PoS-tagger. The products of this project are made available for use in further research in Faroese language technology.
CITATION STYLE
Hafsteinsson, H., & Ingason, A. K. (2021). Shared Digital Resource Application within Insular Scandinavian. Proceedings of the Workshop on Computational Methods for Endangered Languages, 1(2). https://doi.org/10.33011/computel.v1i.961
Mendeley helps you to discover research relevant for your work.