We describe a new approach to speaker verification which, likeJoint Factor Analysis, is based on a generative model of speakerand channel effects but differs from Joint Factor Analysis inseveral respects. Firstly, each utterance is represented by a lowdimensional feature vector, rather than by a high dimensionalset of Baum-Welch statistics. Secondly, heavy-tailed distributionsare used in place of Gaussian distributions in formulatingthe model, so that the effect of outlying data is diminished, bothin training the model and at recognition time. Thirdly, the likelihoodratio used for making verification decisions is calculated(using variational Bayes) in a way which is fully consistent withthe modeling assumptions and the rules of probability. Finally,experimental results show that, in the case of telephone speech,these likelihood ratios do not need to be normalized in order toset a trial-independent threshold for verification decisions.We report results on female speakers for several conditionsin the NIST 2008 speaker recognition evaluation data, includingmicrophone as well as telephone speech. As measured both byequal error rates and the minimum values of the NIST detectioncost function, the results on telephone speech are about 30%better than we have achieved using Joint Factor Analysis.
CITATION STYLE
Kenny, P. (2010). Bayesian speaker verification with heavy tailed priors. In Proc. Odyssey Speaker and Language Recogntion Workshop, Brno, Czech Republic.
Mendeley helps you to discover research relevant for your work.