Building the essential resources for Finnish: the Turku Dependency Treebank

56Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, we present the final version of a publicly available treebank of Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens (15,126 sentences) from 10 different text sources and has been manually annotated in a Finnish-specific version of the well-known Stanford Dependency scheme. The morphological analyses of the treebank have been assigned using a novel machine learning method to disambiguate readings given by an existing tool. As the second main contribution, we present the first open source Finnish dependency parser, trained on the newly introduced treebank. The parser achieves a labeled attachment score of 81 %. The treebank data as well as the parsing pipeline are available under an open license at http://bionlp.utu.fi/.

Author supplied keywords

Cite

CITATION STYLE

APA

Haverinen, K., Nyblom, J., Viljanen, T., Laippala, V., Kohonen, S., Missilä, A., … Ginter, F. (2014). Building the essential resources for Finnish: the Turku Dependency Treebank. Language Resources and Evaluation, 48(3), 493–531. https://doi.org/10.1007/s10579-013-9244-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free