Vocal tract normalization in speech recognition: Compensating for systematic speaker variability

Jordan Cohen; Terri Kamm; Andreas G. Andreou

Journal ArticleOPEN ACCESS

Vocal tract normalization in speech recognition: Compensating for systematic speaker variability

Cohen J
Kamm T
Andreou A

The Journal of the Acoustical Society of America (1995) 97(5_Supplement) 3246-3247

DOI: 10.1121/1.411700

N/ACitations

11Readers

Abstract

The performance of speech recognition systems is often improved by accounting explicitly for sources of variability in the data. In the SWITCHBOARD corpus, studied during the 1994 CAIP workshop [Frontiers in Speech Processing Workshop II, CAIP (August 1994)], an attempt was made to compensate for the systematic variability due to different vocal tract lengths of various speakers. The method found a maximum probability parameter for each speaker which mapped an acoustic model to the mean of the models taken from a homogeneous speaker population. The underlying acoustic model was that of a straight tube, and the parameter estimation was accomplished by warping the spectrum of each speaker linearly over a 20% range (actually accomplished by digitally resampling the data), and finding the maximum aposteriori probability of the data given the warp. The technique produces statistically significant improvements in accuracy on a speech transcription task using each of four different speech recognition systems. The best parametrizations were later found to correlate well with vocal tract estimates computed manually from spectrograms.

Cite

CITATION STYLE

APA

Cohen, J., Kamm, T., & Andreou, A. G. (1995). Vocal tract normalization in speech recognition: Compensating for systematic speaker variability. The Journal of the Acoustical Society of America, 97(5_Supplement), 3246–3247. https://doi.org/10.1121/1.411700

Vocal tract normalization in speech recognition: Compensating for systematic speaker variability

Abstract

Cite

Register to see more suggestions