Pronunciation variation is a major problem in disordered speech recognition. This paper focus on handling the pronunciation variations in dysarthric speech by forming speaker-specific lexicons. A novel approach is proposed for identifying mispronunciations made by each dysarthric speaker, using state-specific vector (SSV) of phone-cluster adaptive training (Phone-CAT) acoustic model. SSV is low-dimensional vector estimated for each tied-state where each element in a vector denotes the weight of a particular monophone. The SSV indicates the pronounced phone using its dominant weight. This property of SSV is exploited in adapting the pronunciation of a particular dysarthric speaker using speaker-specific lexicons. Experimental validation on Nemours database showed an average relative improvement of 9% across all the speakers compared to the system built with canonical lexicon.
CITATION STYLE
Sriranjani, R., Umesh, S., & Reddy, M. R. (2015). Pronunciation adaptation for disordered speech recognition using state-specific vectors of phone-cluster adaptive training. In SLPAT 2015 - 6th Workshop on Speech and Language Processing for Assistive Technologies, Proceedings (pp. 72–78). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-5113
Mendeley helps you to discover research relevant for your work.