Automatic Interlinear Glossing for Otomi language

11Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In linguistics, interlinear glossing is an essential procedure for analyzing the morphology of languages. This type of annotation is useful for language documentation, and it can also provide valuable data for NLP applications. We perform automatic glossing for Otomi, an under-resourced language. Our work also comprises the pre-processing and annotation of the corpus. We implement different sequential labelers. CRF models represented an efficient and good solution for our task (accuracy above 90%). Two main observations emerged from our work: 1) models with a higher number of parameters (RNNs) performed worse in our low-resource scenario; and 2) the information encoded in the CRF feature function plays an important role in the prediction of labels; however, even in cases where POS tags are not available it is still possible to achieve competitive results.

Cite

CITATION STYLE

APA

Barriga, D., Mijangos, V., & Gutierrez-Vasques, X. (2021). Automatic Interlinear Glossing for Otomi language. In Proceedings of the 1st Workshop on Natural Language Processing for Indigenous Languages of the Americas, AmericasNLP 2021 (pp. 34–43). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.americasnlp-1.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free