Vocal tract normalization in speech recognition: Compensating for systematic speaker variability

  • Cohen J
  • Kamm T
  • Andreou A
N/ACitations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The performance of speech recognition systems is often improved by accounting explicitly for sources of variability in the data. In the SWITCHBOARD corpus, studied during the 1994 CAIP workshop [Frontiers in Speech Processing Workshop II, CAIP (August 1994)], an attempt was made to compensate for the systematic variability due to different vocal tract lengths of various speakers. The method found a maximum probability parameter for each speaker which mapped an acoustic model to the mean of the models taken from a homogeneous speaker population. The underlying acoustic model was that of a straight tube, and the parameter estimation was accomplished by warping the spectrum of each speaker linearly over a 20% range (actually accomplished by digitally resampling the data), and finding the maximum aposteriori probability of the data given the warp. The technique produces statistically significant improvements in accuracy on a speech transcription task using each of four different speech recognition systems. The best parametrizations were later found to correlate well with vocal tract estimates computed manually from spectrograms.

Cite

CITATION STYLE

APA

Cohen, J., Kamm, T., & Andreou, A. G. (1995). Vocal tract normalization in speech recognition: Compensating for systematic speaker variability. The Journal of the Acoustical Society of America, 97(5_Supplement), 3246–3247. https://doi.org/10.1121/1.411700

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free