Abstract
The formant structure of three diphthongs, four tense vowels, and three retroflex sounds was examined in detail for possible speaker-identifying features. These sounds were spoken five times each in sentence context by ten speakers of General American on one day and by six of the speakers on a second day at least three weeks later. Formant tracks were computed for each sound under investigation using covariance-type pitch-asynchronous linear prediction together with a root-finding algorithm. The interspeaker variability of about 200 measurements made on these formant tracks was compared initially with intraspeaker variability through the calculation of F ratios. Those with average F ratios greater than 60 were evaluated further with a probability-of-error criterion. Features that are potentially most effective in identifying speakers are the minimum second-formant value in [-r], the maximum first-formant value in [-r], the maximum second-formant values of [o] and [-I], and the minimum third-formant value of [-]. The individual differences apparent in these sounds presumably depend more on speaker habits than on vocal-tract anatomy. The error bound predicted for a speaker identification procedure based on these five features is 0.24%. An identification experiment using only the best two features gave 12 errors out of 80 identifications.Subject Classification: [43]70.65, [43]70.40.
Cite
CITATION STYLE
Goldstein, U. G. (1976). Speaker-identifying features based on formant tracks. The Journal of the Acoustical Society of America, 59(1), 176–182. https://doi.org/10.1121/1.380837
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.