Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data

Colin Leong; Daniel Whitenack

Conference ProceedingsOPEN ACCESS

Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 5306-5315

DOI: 10.18653/v1/2022.acl-long.364

4Citations

41Readers

Abstract

Multi-modal techniques offer significant untapped potential to unlock improved NLP technology for local languages. However, many advances in language model pre-training are focused on text, a fact that only increases systematic inequalities in the performance of NLP tasks across the world's languages. In this work, we propose a multi-modal approach to train language models using whatever text and/or audio data might be available in a language. Initial experiments using Swahili and Kinyarwanda data suggest the viability of the approach for downstream Named Entity Recognition (NER) tasks, with models pre-trained on phone data showing an improvement of up to 6% F1-score above models that are trained from scratch. Preprocessing and training code will be uploaded to https://github.com/sil-ai/phone-it-in.

Cite

CITATION STYLE

APA

Leong, C., & Whitenack, D. (2022). Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 5306–5315). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.364

Phone-ing it in: Towards Flexible, Multi-Modal Language Model Training using Phonetic Representations of Data

Abstract

Cite

Register to see more suggestions