Abstract
Multi-modal techniques offer significant untapped potential to unlock improved NLP technology for local languages. However, many advances in language model pre-training are focused on text, a fact that only increases systematic inequalities in the performance of NLP tasks across the world's languages. In this work, we propose a multi-modal approach to train language models using whatever text and/or audio data might be available in a language. Initial experiments using Swahili and Kinyarwanda data suggest the viability of the approach for downstream Named Entity Recognition (NER) tasks, with models pre-trained on phone data showing an improvement of up to 6% F1-score above models that are trained from scratch. Preprocessing and training code will be uploaded to https://github.com/sil-ai/phone-it-in.
Cite
CITATION STYLE
Leong, C., & Whitenack, D. (2022). Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 5306–5315). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.364
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.