Morphological Segmentation for Seneca

Zoey Liu; Robbie Jimerson; Emily Prud’hommeaux

Conference Proceedings

Morphological Segmentation for Seneca

Proceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 (2021) 90-101

DOI: 10.18653/v1/2021.americasnlp-1.10

12Citations

55Readers

Get full text

Abstract

This study takes up the task of low-resource morphological segmentation for Seneca, a critically endangered and morphologically complex Native American language primarily spoken in what is now New York State and Ontario. The labeled data in our experiments comes from two sources: one digitized from a publicly available grammar book and the other collected from informal sources. We treat these two sources as distinct domains and investigate different evaluation designs for model selection. The first design abides by standard practices and evaluates models with the in-domain development set, while the second one carries out evaluation using a development domain, or the out-of-domain development set. Across a series of monolingual and cross-linguistic training settings, our results demonstrate the utility of neural encoder-decoder architecture when coupled with multitask learning.

Cite

CITATION STYLE

APA

Liu, Z., Jimerson, R., & Prud’hommeaux, E. (2021). Morphological Segmentation for Seneca. In Proceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 (pp. 90–101). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.americasnlp-1.10

Morphological Segmentation for Seneca

Abstract

Cite

Register to see more suggestions