Enhancing Documentation of Hupa with Automatic Speech Recognition

Zoey Liu; Justin Spence; Emily Prud'hommeaux

Conference ProceedingsOPEN ACCESS

Enhancing Documentation of Hupa with Automatic Speech Recognition

COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop (2022) 187-192

DOI: 10.18653/v1/2022.computel-1.23

3Citations

29Readers

Abstract

This study investigates applications of automatic speech recognition (ASR) techniques to Hupa, a critically endangered Native American language from the Dene (Athabaskan) language family. Using around 9h12m of spoken data produced by one elder who is a first-language Hupa speaker, we experimented with different evaluation schemes and training settings. On average a fully connected deep neural network reached a word error rate of 35.26%. Our overall results illustrate the utility of ASR for making Hupa language documentation more accessible and usable. In addition, we found that when training acoustic models, using recordings with transcripts that were not carefully verified did not necessarily have a negative effect on model performance. This shows promise for speech corpora of indigenous languages that commonly include transcriptions produced by second-language speakers or linguists who have advanced knowledge in the language of interest.

Cite

CITATION STYLE

APA

Liu, Z., Spence, J., & Prud’hommeaux, E. (2022). Enhancing Documentation of Hupa with Automatic Speech Recognition. In COMPUTEL 2022 - 5th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop (pp. 187–192). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.computel-1.23

Enhancing Documentation of Hupa with Automatic Speech Recognition

Abstract

Cite

Register to see more suggestions