Glossy Bytes: Neural Glossing using Subword Encoding

Ziggy Cross; Michelle Yun; Ananya Apparaju; Jata MacCabe; Garrett Nicolai; Miikka Silfverberg

Conference Proceedings

Glossy Bytes: Neural Glossing using Subword Encoding

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 222-229

DOI: 10.18653/v1/2023.sigmorphon-1.24

3Citations

15Readers

Get full text

Abstract

This paper presents several subword-modellingbased approaches to interlinear glossing for seven under-resourced languages as a part of the 2023 SIGMORPHON shared task on interlinear glossing (Ginn et al., 2023). In an interlinear glossed text (IGT), each line of the original text is paired with one or more corresponding lines which encode the underlying grammatical structure. While expert annotated glossed text is especially valuable for the study of low-resource languages in both theoretical linguistics and natural language processing, generating high-quality glossed data is expensive and time-consuming. Therefore, approaches which aim to automatically or semiautomatically generate glossed data can be valuable for linguistic research. We experiment with various augmentation and tokenization strategies for both the open and closed tracks of data. We found that while subword models may perform well for greater amounts of data, character-based approaches remain competitive in their performance in lower resource settings.

Cite

CITATION STYLE

APA

Cross, Z., Yun, M., Apparaju, A., MacCabe, J., Nicolai, G., & Silfverberg, M. (2023). Glossy Bytes: Neural Glossing using Subword Encoding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 222–229). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.sigmorphon-1.24

Glossy Bytes: Neural Glossing using Subword Encoding

Abstract

Cite

Register to see more suggestions