Towards a bank of constituent parse trees for polish

Marek Świdziński; Marcin Woliński

Conference Proceedings

Towards a bank of constituent parse trees for polish

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6231 LNAI 197-204

DOI: 10.1007/978-3-642-15760-8_26

25Citations

5Readers

Get full text

Abstract

We present a project aimed at construction of a bank of constituent parse trees for 20,000 Polish sentences taken from the balanced hand-annotated subcorpus of the National Corpus of Polish (NKJP). The treebank is to be obtained by automatic parsing and manual disambiguation of resulting trees. The grammar applied by the project is a new version of Świdziński's formal definition of Polish. Each sentence is disambiguated independently by two linguists and, if needed, adjudicated by a supervisor. The feedback from this process is used to iteratively improve the grammar. In the paper, we describe linguistic but also technical decisions made in the project. We discuss the overall shape of the parse trees including the extent of encoded grammatical information. We also delve into the problem of syntactic disambiguation as a challenge for our job. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Świdziński, M., & Woliński, M. (2010). Towards a bank of constituent parse trees for polish. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6231 LNAI, pp. 197–204). https://doi.org/10.1007/978-3-642-15760-8_26

Towards a bank of constituent parse trees for polish

Abstract

Author supplied keywords

Cite

Register to see more suggestions