ASR pipeline for low-resourced languages: A case study on Pomak

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.

Cite

CITATION STYLE

APA

Tsoukala, C., Kritsis, K., Douros, I., Katsamanis, A., Kokkas, N., Arampatzakis, V., … Pavlidis, G. (2023). ASR pipeline for low-resourced languages: A case study on Pomak. In FieldMatters 2023 - 2nd Workshop on NLP Applications to Field Linguistics, Proceedings (pp. 30–39). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.fieldmatters-1.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free