Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to address this issue by creating automatic glossing models, using modern multi-source neural models that additionally leverage easy-to-collect translations. We further explore cross-lingual transfer and a simple output length control mechanism, further refining our models. Evaluated on three challenging low-resource scenarios, our approach significantly outperforms a recent, state-of-the-art baseline, particularly improving on overall accuracy as well as lemma and tag recall.
CITATION STYLE
Zhao, X., Ozaki, S., Anastasopoulos, A., Neubig, G., & Levin, L. (2020). Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 5397–5408). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.471
Mendeley helps you to discover research relevant for your work.