We describe a new approach to speaker verification which, like Joint Factor Analysis, is based on a generativemodel of speaker and channel effects but differs from Joint Factor Analysis in several respects. Firstly, each utterance is represented by a low dimensional feature vector, rather than by a high dimensional set of Baum-Welch statistics. Secondly, heavy-tailed distribu- tions are used in place of Gaussian distributions in formulating themodel, so that the effect of outlying data is diminished, both in training the model and at recognition time. Thirdly, the like- lihood ratio used for making verification decisions is calculated (using variational Bayes) in awaywhich is fully consistentwith the modeling assumptions and the rules of probability. Finally, experimental results show that, in the case of telephone speech, these likelihood ratios do not need to be normalized in order to set a trial-independent threshold for verification decisions. We report results on female speakers for several conditions in theNIST 2008 speaker recognition evaluation data, including microphone as well as telephone speech. As measured both by equal error rates and theminimumvalues of the NIST detection cost function, the results on telephone speech are about 30% better than we have achieved using Joint Factor Analysis.
Mendeley saves you time finding and organizing research
There are no full text links
Choose a citation style from the tabs below