Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

Eduardo Medeiros; Leonel Corado; Luís Rato; Paulo Quaresma; Pedro Salgueiro

Journal ArticleOPEN ACCESS

Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

Future Internet (2023) 15(5)

DOI: 10.3390/fi15050159

2Citations

13Readers

Abstract

Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimization and evaluation for the European Portuguese language. We present a pipeline composed of several stages for data acquisition, analysis, pre-processing, model creation, and evaluation. A transfer learning approach is proposed considering an English language-optimized model as starting point; a target composed of European Portuguese; and the contribution to the transfer process by a source from a different domain consisting of a multiple-variant Portuguese language dataset, essentially composed of Brazilian Portuguese. A domain adaptation was investigated between European Portuguese and mixed (mostly Brazilian) Portuguese. The proposed optimization evaluation used the NVIDIA NeMo framework implementing the QuartzNet15×5 architecture based on 1D time-channel separable convolutions. Following this transfer learning data-centric approach, the model was optimized, achieving a state-of-the-art word error rate (WER) of 0.0503.

Author supplied keywords

Cite

CITATION STYLE

APA

Medeiros, E., Corado, L., Rato, L., Quaresma, P., & Salgueiro, P. (2023). Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning. Future Internet, 15(5). https://doi.org/10.3390/fi15050159

Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions