Abstract
The paper presents the largest Polish Dependency Bank in Universal Dependencies format - PDBUD - with 22K trees and 352K tokens. PDBUD builds on its previous version, i.e. the Polish UD treebank (PL-SZ), and contains all 8K PL-SZ trees. The PL-SZ trees are checked and possibly corrected in the current edition of PDBUD. Further 14K trees are automatically converted from a new version of Polish Dependency Bank. The PDBUD trees are expanded with the enhanced edges encoding the shared dependents and the shared governors of the coordinated conjuncts and with the semantic roles of some dependents. The conducted evaluation experiments show that PDBUD is large enough for training a high-quality graph-based dependency parser for Polish.
Cite
CITATION STYLE
Wróblewska, A. (2018). Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format. In EMNLP 2018 - 2nd Workshop on Universal Dependencies, UDW 2018 - Proceedings of the Workshop (pp. 173–182). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-6020
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.