In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis. We report on several experiments in enrichment of training data for this specific construction, evaluated on five languages: Czech, English, Finnish, Russian and Slovak. These data enrichment methods draw upon self-training and tri-training, combined with a stratified sampling method mimicking the structural complexity of the original treebank. In addition, using these same methods, we also demonstrate small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Droganova, K., Ginter, F., Kanerva, J., & Zeman, D. (2018). Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions. In EMNLP 2018 - 2nd Workshop on Universal Dependencies, UDW 2018 - Proceedings of the Workshop (pp. 47–54). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-6006