Glossy Bytes: Neural Glossing using Subword Encoding

3Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents several subword-modellingbased approaches to interlinear glossing for seven under-resourced languages as a part of the 2023 SIGMORPHON shared task on interlinear glossing (Ginn et al., 2023). In an interlinear glossed text (IGT), each line of the original text is paired with one or more corresponding lines which encode the underlying grammatical structure. While expert annotated glossed text is especially valuable for the study of low-resource languages in both theoretical linguistics and natural language processing, generating high-quality glossed data is expensive and time-consuming. Therefore, approaches which aim to automatically or semiautomatically generate glossed data can be valuable for linguistic research. We experiment with various augmentation and tokenization strategies for both the open and closed tracks of data. We found that while subword models may perform well for greater amounts of data, character-based approaches remain competitive in their performance in lower resource settings.

Cite

CITATION STYLE

APA

Cross, Z., Yun, M., Apparaju, A., MacCabe, J., Nicolai, G., & Silfverberg, M. (2023). Glossy Bytes: Neural Glossing using Subword Encoding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 222–229). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sigmorphon-1.24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free