Speech recognition using multiscale scattering of audio signals and long short-term memory of neural networks

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Communication is one of the key elements of interaction. In order to understand the audio language used by humans, machines use different techniques to convert speech to machine readable form called speech recognition. This paper takes one of the most classic examples of the speech recognition domain, the spoken digit’s recognition. The recognition is done with the help of a technique called wavelet scattering that initially extracts useful information from the signals and sends this information further to a Long Short-Term Memory (LSTM) network to classify the signals. A major advantage of using the LSTM is that it overcomes the vanishing gradient problem and this proposed technique can be used in applications like entry of numerical data for blind people. This method provides an increased accuracy than other standard methods that uses Mel-frequency Cepstral coefficients (MFFC) and LSTM network to recognize digits. The main objective of this work achieved its primary purpose to validate the efficiency of wavelet scattering technique and LSTM networks for spoken digits’ recognition.

Cite

CITATION STYLE

APA

Mahalingam, H., & Rajakumar, M. E. M. P. (2019). Speech recognition using multiscale scattering of audio signals and long short-term memory of neural networks. International Journal of Innovative Technology and Exploring Engineering, 8(11), 2955–2961. https://doi.org/10.35940/ijitee.K2270.0981119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free