Automatic recognition of Meta data of a speaker apart from recognizing only his or her identity is a challenging task. It gives rich behavioral characteristics of a person.Maximum work have been done in speaker recognitionon low level spectral features. Which gives good accuracy with minimum error, but they ignore other information about the speaker. Also in spectral variations, in session variations and in channel variations these features give degraded performance. State-of-the-art systems for text-independent speaker identification use Mel Frequency cepstral coefficients (MFCCs) as main features. Generally this system performs very good under clean conditions and acceptable under matched conditions. Under mismatched conditions, however, performance significantly deteriorates. One of the principal reasons for poor performance in these conditions is because of the nature of low-level features; being spectral, they are susceptible to spectral variations due to noise and channel effects.Prosodic features are used successfully in these variation conditions as well as in presence of noise.In this paper multi SNR environment is considered. Recognition accuracy has been calculated at different SNR levels i.e. 15 dB, 25 dB and 35 dB.Also results are tested at different types of noise such as Traffic noise, cockpit noise, babble noise and fan noise. It has been found that combining prosodic features such as pitch, energy and formants gives improved performance.
CITATION STYLE
Jagdale, S. M., Shinde, A. A., & Chitode, J. S. (2019). Text independent speaker identification with prosody features in presence of noise. International Journal of Innovative Technology and Exploring Engineering, 8(9 Special Issue 3), 124–127. https://doi.org/10.35940/ijitee.i3025.0789s319
Mendeley helps you to discover research relevant for your work.