Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data

4Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

Abstract

Multi-modal techniques offer significant untapped potential to unlock improved NLP technology for local languages. However, many advances in language model pre-training are focused on text, a fact that only increases systematic inequalities in the performance of NLP tasks across the world's languages. In this work, we propose a multi-modal approach to train language models using whatever text and/or audio data might be available in a language. Initial experiments using Swahili and Kinyarwanda data suggest the viability of the approach for downstream Named Entity Recognition (NER) tasks, with models pre-trained on phone data showing an improvement of up to 6% F1-score above models that are trained from scratch. Preprocessing and training code will be uploaded to https://github.com/sil-ai/phone-it-in.

Cite

CITATION STYLE

APA

Leong, C., & Whitenack, D. (2022). Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 5306–5315). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.364

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free