Mel-weighted single frequency filtering spectrogram for dialect identification

14Citations
Citations of this article
38Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this study, we propose Mel-weighted single frequency filtering (SFF) spectrograms for dialect identification. The spectrum derived using SFF has high spectral resolution for harmonics and resonances while simultaneously maintaining good time-resolution of some speech excitation features such as impulse-like events. The SFF spectrum can represent speech characteristics such as burst time and glottal closure instances better than the short-time Fourier transform (STFT) spectrum. Our hypothesis is that these intricate representations in the SFF spectrum should help in distinguishing dialects. Therefore, we built a dialect identification system which uses an unsupervised, bottleneck feature representation of the Mel-weighted SFF spectrogram (Mel-SFF spectrogram) with sequence-to-sequence deep autoencoders. The language invariance of the proposed system was evaluated using two datasets: the UT-Podcast database (English) and the STYRIALECT database (German). The proposed representations gave a relative improvement of 9.47% and 4.69% in unweighted average recall (UAR) compared to the best baseline method on the development and test datasets, respectively, of the UT-Podcast database. The proposed representations also gave a comparable performance to the best baseline method for the STYRIALECT database. In addition, the fusion of the autoencoder bottleneck features computed from the Mel-SFF and Mel-STFT spectrograms improved the overall performance indicating complementary information between these features. By further analyzing the performance of the proposed representation with different utterance lengths using the UT-Podcast database, we observed that the proposed representation performed better on short utterances. The improved performance given by the Mel-weighted SFF spectrogram for recognizing dialects in both databases supports our hypothesis.

Cite

CITATION STYLE

APA

Kethireddy, R., Kadiri, S. R., Alku, P., & Gangashetty, S. V. (2020). Mel-weighted single frequency filtering spectrogram for dialect identification. IEEE Access, 8, 174871–174879. https://doi.org/10.1109/ACCESS.2020.3020506

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free