This study takes up the task of low-resource morphological segmentation for Seneca, a critically endangered and morphologically complex Native American language primarily spoken in what is now New York State and Ontario. The labeled data in our experiments comes from two sources: one digitized from a publicly available grammar book and the other collected from informal sources. We treat these two sources as distinct domains and investigate different evaluation designs for model selection. The first design abides by standard practices and evaluates models with the in-domain development set, while the second one carries out evaluation using a development domain, or the out-of-domain development set. Across a series of monolingual and cross-linguistic training settings, our results demonstrate the utility of neural encoder-decoder architecture when coupled with multitask learning.
CITATION STYLE
Liu, Z., Jimerson, R., & Prud’hommeaux, E. (2021). Morphological Segmentation for Seneca. In Proceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 (pp. 90–101). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.americasnlp-1.10
Mendeley helps you to discover research relevant for your work.