Abstract
This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.
Cite
CITATION STYLE
Östling, R. (2013). Stagger: an Open-Source Part of Speech Tagger for Swedish. Northern European Journal of Language Technology, 3, 1–18. https://doi.org/10.3384/nejlt.2000-1533.1331
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.