Audio Replay Attack Detection for Speaker Verification System Using Convolutional Neural Networks

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

An audio replay attack is one of the most popular spoofing attacks on speaker verification systems because it is very economical and does not require much knowledge of signal processing. In this paper, we investigate the significance of non-voiced audio segments and deep learning models like Convolutional Neural Networks (CNN) for audio replay attack detection. The non-voiced segments of the audio can be used to detect reverberation and channel noise. FFT spectrograms are generated and given as input to CNN to classify the audio as genuine or replay. The advantage of the proposed approach is, because of the removal of the voiced speech, the feature vector size is reduced without compromising the necessary features. This leads to significant amount of reduction on training time of the networks. The ASVspoof 2017 dataset is used to train and evaluate the model. The Equal Error Rate (EER) is computed and used as a metric to evaluate model performance. The proposed system has achieved an EER of 5.62% on the development dataset and 12.47% on the evaluation dataset.

Cite

CITATION STYLE

APA

Kemanth, P. J., Supanekar, S., & Koolagudi, S. G. (2019). Audio Replay Attack Detection for Speaker Verification System Using Convolutional Neural Networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11942 LNCS, pp. 445–453). Springer. https://doi.org/10.1007/978-3-030-34872-4_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free