ASR pipeline for low-resourced languages: A case study on Pomak

Chara Tsoukala; Kosmas Kritsis; Ioannis Douros; Athanasios Katsamanis; Nikolaos Kokkas; Vasileios Arampatzakis; Vasileios Sevetlidis; Stella Markantonatou; George Pavlidis

Conference ProceedingsOPEN ACCESS

ASR pipeline for low-resourced languages: A case study on Pomak

FieldMatters 2023 - 2nd Workshop on NLP Applications to Field Linguistics, Proceedings (2023) 30-39

DOI: 10.18653/v1/2023.fieldmatters-1.5

1Citations

8Readers

Abstract

Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.

Cite

CITATION STYLE

APA

Tsoukala, C., Kritsis, K., Douros, I., Katsamanis, A., Kokkas, N., Arampatzakis, V., … Pavlidis, G. (2023). ASR pipeline for low-resourced languages: A case study on Pomak. In FieldMatters 2023 - 2nd Workshop on NLP Applications to Field Linguistics, Proceedings (pp. 30–39). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.fieldmatters-1.5

ASR pipeline for low-resourced languages: A case study on Pomak

Abstract

Cite

Register to see more suggestions