Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic

Maria Sukhareva; Christian Chiarcos

Conference Proceedings

Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic

1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics: System Demonstrations, COLING 2014 - Proceedings (2014) 11-20

DOI: 10.3115/v1/w14-5302

10Citations

68Readers

Get full text

Abstract

For the study of historical language varieties, the sparsity of training data imposes immense problems on syntactic annotation and the development of NLP tools that automatize the process. In this paper, we explore strategies to compensate the lack of training data by including data from related varieties in a series of annotation projection experiments from English to four old Germanic languages: On dependency syntax projected from English to one or multiple language(s), we train a fragment-Aware parser trained and apply it to the target language. For parser training, we consider small datasets from the target language as a baseline, and compare it with models trained on larger datasets from multiple varieties with different degrees of relatedness, thereby balancing sparsity and diachronic proximity.

Cite

CITATION STYLE

APA

Sukhareva, M., & Chiarcos, C. (2014). Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic. In 1st Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial 2014 at the 25th International Conference on Computational Linguistics: System Demonstrations, COLING 2014 - Proceedings (pp. 11–20). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5302

Diachronic proximity vs. data sparsity in cross-lingual parser projection. A case study on Germanic

Abstract

Cite

Register to see more suggestions