Trigger Word Recognition using LSTM

  • Kalyanam Supriya
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

A Trigger word is a word that you use to wake up a virtual voice assistant, for example "Hey Siri" or "Hey Alexa". First, we need a good labelled training data to train our model. We could record 10 seconds audio clip of people saying positive ("activate" in this case) and negative (words that are not "activate") examples and label manually when the trigger words were spoken by the people. Labelling the data manually is complex and time consuming. Instead, the training data is generated artificially. We would then need 3 types of audio clips: 1. Positive examples of people saying the word "activate", 1 or 2 seconds each 2. Negative examples of people saying random words, 1 or 2 seconds each 3. Background noise, for example coffee shop or office, 10 seconds each The training data that we have generated need to be pre-processed before it is sent to a machine learning model. Due to the variation of air pressure sound can be produced. The input data to the model is the spectrogram for each generated audio due to which the target will be the labels created earlier. In recent years, Deep Learning (DL) has occupied increasing attention within the industry and academic world for its high performance in various domains. Today, Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) are the most popular forms of DL architectures used. We are doing Trigger Word Recognition on speech data by using Long Short-Term Memory (LSTM).

Cite

CITATION STYLE

APA

Kalyanam Supriya. (2020). Trigger Word Recognition using LSTM. International Journal of Engineering Research And, V9(06). https://doi.org/10.17577/ijertv9is060092

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free