Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Sajid Saleem; Fazli Subhan; Noman Naseer; Abdul Bais; Ammara Imtiaz

Journal Article

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Forensic Science International: Digital Investigation (2020) 34

DOI: 10.1016/j.fsidi.2020.300982

24Citations

56Readers

Get full text

Abstract

This paper presents a new method for Forensic Speaker Recognition (FSR). The new method is based on extracting accent and language information from short utterances. Accent Classification (AC) and Language Identification (LI) play important role in the identification of people of different groups, communities and origins due to different speaking styles and native languages. In a multilingual society, the forensic experts use AC and LI to reduce search space for suspect recognition to regional and ethnic groups. In this paper, we use different baseline and deep learning methods to automate this process. The baseline methods used are Gaussian Mixture Model-Universal Background Model (GMM-UBM), i-vector and Gaussian Mixture Model-Support Vector Machine (GMM-SVM). The Mel-Frequency Cepstral Coefficients (MFCC) are used as speech features in the baseline methods. The deep learning methods used are Convolutional Neural Network (CNN) and Deep Neural Network (DNN). The recently proposed CNN based methods like VGGVox and GMM-CNN are used. VGGVox and GMM-CNN use speech spectrograms. In case of DNN, x-vectors method is used, which is based on DNN embedding. The experimental results show that GMM-SVM demonstrates better FSR performance compared to GMM-UBM and i-vector methods. Whereas, x-vectors method performs better than GMM-CNN and VGGVox methods. It also performs better than GMM-SVM method. The experimental results show that x-vectors method demonstrates 80.4% FSR accuracy. With AC, it achieves 85.4% accuracy. With LI, its accuracy is 90.2%. Whereas by combining AC and LI it obtains 95.1% accuracy. This shows that the proposed method based on AC and LI gives promising results.

Author supplied keywords

Cite

CITATION STYLE

APA

Saleem, S., Subhan, F., Naseer, N., Bais, A., & Imtiaz, A. (2020). Forensic speaker recognition: A new method based on extracting accent and language information from short utterances. Forensic Science International: Digital Investigation, 34. https://doi.org/10.1016/j.fsidi.2020.300982

Forensic speaker recognition: A new method based on extracting accent and language information from short utterances

Abstract

Author supplied keywords

Cite

Register to see more suggestions