Designing and comparing G2P-type lemmatizers for a morphology-rich language

Steffen Eger

Conference Proceedings

Designing and comparing G2P-type lemmatizers for a morphology-rich language

Eger S

Communications in Computer and Information Science (2015) 537 27-40

DOI: 10.1007/978-3-319-23980-4_2

2Citations

3Readers

Get full text

Abstract

We consider the statistical lemmatization problem in which lemmatizers are trained on (word form, lemma) pairs. In particular, we consider this problem for ancient Latin, a language with high degree of morphological variability.We investigate whether general purpose stringto- string transduction models are suitable for this task, and find that they typically perform (much) better than more restricted lemmatization techniques/heuristics based on suffix transformations.We also experimentally test whether string transduction systems that perform well on one string-to-string translation task (here, G2P) perform well on another (here, lemmatization) and vice versa, and find that a joint n-gram modeling performs better on G2P than a discriminative model of our own making but that this relationship is reversed for lemmatization. Finally, we investigate how the learned lemmatizers can complement lexicon-based systems, e.g., by tackling the OOV and/or the disambiguation problem.

Cite

CITATION STYLE

APA

Eger, S. (2015). Designing and comparing G2P-type lemmatizers for a morphology-rich language. In Communications in Computer and Information Science (Vol. 537, pp. 27–40). Springer Verlag. https://doi.org/10.1007/978-3-319-23980-4_2

Designing and comparing G2P-type lemmatizers for a morphology-rich language

Abstract

Cite

Register to see more suggestions