FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition

Seong Su Lim; Oh Wook Kwon

Journal ArticleOPEN ACCESS

FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition

Applied Sciences (Switzerland) (2022) 12(15)

DOI: 10.3390/app12157619

2Citations

5Readers

Abstract

As the architecture of deep learning-based speech recognizers has recently changed to the end-to-end style, increasing the effective amount of training data has become an important issue. To tackle this issue, various data augmentation techniques to create additional training data by transforming labeled data have been studied. We propose a method called FrameAugment to augment data by changing the speed of speech locally for selected sections, which is different from the conventional speed perturbation technique that changes the speed of speech uniformly for the entire utterance. To change the speed of the selected sections of speech, the number of frames for the randomly selected sections is adjusted through linear interpolation in the spectrogram domain. The proposed method is shown to achieve 6.8% better performance than the baseline in the WSJ database and 9.5% better than the baseline in the LibriSpeech database. It is also confirmed that the proposed method further improves speech recognition performance when it is combined with the previous data augmentation techniques.

Author supplied keywords

Cite

CITATION STYLE

APA

Lim, S. S., & Kwon, O. W. (2022). FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition. Applied Sciences (Switzerland), 12(15). https://doi.org/10.3390/app12157619

FrameAugment: A Simple Data Augmentation Method for Encoder–Decoder Speech Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions