Normalization of formant frequencies have frequently been used to eliminate inter-speaker differences in vowel recognition. However, estimation of formant frequencies becomes difficult under certain circumstances, such as for telephone speech. This paper presents an approach to vowel normalization based on frequency warped spectral matching. A frequency normalized distance between test and reference spectra is defined on the basis of the minimum mean square difference over all possible choices of frequency warping functions under certain nonlinearity constraints and boundary conditions. After adaptively eliminating spectral slope differences due to the individual glottal characteristics, the spectral distance is computed by means of dynamic programming. The vowel identification experiments were conducted on the nine American English vowels in /hvd/ utterances spoken by 12 male and 12 female speakers. The results indicated that the frequency warping method substantially increased the identification scores for female vowels when the male vowels were used as reference. They also indicated that although the improvement in identification was attributed mainly to the linear frequency scaling, an additional improvement for vowel /ae/ was obtained by a slight nonlinear frequency warping. In addition, an application to speaker normalization for word detection in connected speech is discussed. © 1986.
Matsumoto, H., & Wakita, H. (1986). Vowel normalization by frequency warped spectral matching. Speech Communication, 5(2), 239–251. https://doi.org/10.1016/0167-6393(86)90011-7