Speaker recognition is a technology that uses identity information in the human voice for identity recognition, which owns many advantages in convenient information gathering, low gathering cost and high recognition accuracy. However, the difficulty in gathering messages within short utterance declines the voiceprint recognition function rapidly. We propose a recognition model based on SincNet in the aim of obtaining enough feature information in short utterance. The model used a set of learnable Sinc-based filter banks to extract feature directly from primordial voice in featured extraction layer, which enabled neural networks to discover more valuable voiceprint information; In the pooling layer, we designed the pooling method of dual attention mechanism, which combined multiple self-attention mechanism and self-attention mechanism to enrich the feature information and enhance the differentiation degree of key features so as to solve the defect of short speech with less information; choose ArcFace as the loss function, which can maximize the classification limit in the Angle space, thus improving the classification ability of the model. Experimental results demonstrate that the proposed model performs better than the benchmark model.
CITATION STYLE
Guo, M., Yang, J., & Gao, S. (2021). Speaker recognition method for short utterance. In Journal of Physics: Conference Series (Vol. 1827). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1827/1/012158
Mendeley helps you to discover research relevant for your work.