Using ASR-Generated Text for Spoken Language Modeling

3Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

This papers aims at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute1) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT2) or through training a LM from scratch. The new models (FlauBERT-Oral) are shared with the community3 and are evaluated not only in terms of word prediction accuracy but also for two downstream tasks: classification of TV shows and syntactic parsing of speech. Experimental results show that FlauBERT-Oral is better than its initial FlauBERT version demonstrating that, despite its inherent noisy nature, ASR-Generated text can be useful to improve spoken language modeling.

Cite

CITATION STYLE

APA

Hervé, N., Pelloin, V., Favre, B., Dary, F., Laurent, A., Meignier, S., & Besacier, L. (2022). Using ASR-Generated Text for Spoken Language Modeling. In 2022 Challenges and Perspectives in Creating Large Language Models, Proceedings of the Workshop (pp. 17–25). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.bigscience-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free