A Theory of Unsupervised Speech Recognition

0Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora. While various algorithms exist to solve this problem, a theoretical framework is missing to study their properties and address such issues as sensitivity to hyperparameters and training instability. In this paper, we proposed a general theoretical framework to study the properties of ASR-U systems based on random matrix theory and the theory of neural tangent kernels. Such a framework allows us to prove various learnability conditions and sample complexity bounds of ASR-U. Extensive ASR-U experiments on synthetic languages with three classes of transition graphs provide strong empirical evidence for our theory (code available at cactuswiththoughts/UnsupASRTheory.git).

Cite

CITATION STYLE

APA

Wang, L., Hasegawa-Johnson, M., & Yoo, C. D. (2023). A Theory of Unsupervised Speech Recognition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1192–1215). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.67

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free