Trigger Word Recognition using LSTM

undefined; undefined Kalyanam Supriya

Journal ArticleOPEN ACCESS

Trigger Word Recognition using LSTM

Kalyanam Supriya

International Journal of Engineering Research and (2020) V9(06)

DOI: 10.17577/ijertv9is060092

N/ACitations

5Readers

Abstract

A Trigger word is a word that you use to wake up a virtual voice assistant, for example "Hey Siri" or "Hey Alexa". First, we need a good labelled training data to train our model. We could record 10 seconds audio clip of people saying positive ("activate" in this case) and negative (words that are not "activate") examples and label manually when the trigger words were spoken by the people. Labelling the data manually is complex and time consuming. Instead, the training data is generated artificially. We would then need 3 types of audio clips: 1. Positive examples of people saying the word "activate", 1 or 2 seconds each 2. Negative examples of people saying random words, 1 or 2 seconds each 3. Background noise, for example coffee shop or office, 10 seconds each The training data that we have generated need to be pre-processed before it is sent to a machine learning model. Due to the variation of air pressure sound can be produced. The input data to the model is the spectrogram for each generated audio due to which the target will be the labels created earlier. In recent years, Deep Learning (DL) has occupied increasing attention within the industry and academic world for its high performance in various domains. Today, Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) are the most popular forms of DL architectures used. We are doing Trigger Word Recognition on speech data by using Long Short-Term Memory (LSTM).

Cite

CITATION STYLE

APA

Kalyanam Supriya. (2020). Trigger Word Recognition using LSTM. International Journal of Engineering Research And, V9(06). https://doi.org/10.17577/ijertv9is060092

Trigger Word Recognition using LSTM

Abstract

Cite

Register to see more suggestions