Speaker recognition is the process of recognizing the speaker by using speaker-specific information. A speaker recognition system can be classified into text-dependent speaker recognition and text-independent speaker recognition systems. In a text-dependent system, the recognition phrases are fixed (known beforehand). The user can be prompted to read a randomly selected sequence of numbers. However, in a text-independent speaker recognition system, there are no constraints on the words which the speakers are allowed to use. What is spoken in training and what is uttered in actual use may have completely different content. The entire domain of speaker recognition can be further categorized into speaker identification and speaker verification. Speaker verification evaluates whether the voice belongs to some person, while speaker identification tries to find out the person it belongs to. In this paper, Mel-frequency cepstral coefficients (MFCC) were extracted from the audio files. These features were then fed a convolutional neural network (CNN). This CNN was then optimized in order to increase model accuracy. Over the span of six runs of varying parameters, a maximum accuracy of approx. 97% was achieved.
CITATION STYLE
Srivastava, S., Chaudhary, G., & Shukla, C. (2021). Text-Independent Speaker Recognition Using Deep Learning. In EAI/Springer Innovations in Communication and Computing (pp. 41–51). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-76167-7_2
Mendeley helps you to discover research relevant for your work.