A Perceptually Inspired Data Augmentation Method for Noise Robust CNN Acoustic Models

12Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Here, we present a data augmentation method that improves the robustness of convolutional neural network-based speech recognizers to additive noise. The proposed technique has its roots in the input dropout method because it discards a subset of the input features. However, instead of doing this in a completely random fashion, we introduce two simple heuristics that select the less reliable components of the spectrum of the speech signal as candidates for dropout. The first heuristic retains spectro-temporal maxima, while the second is based on a crude estimation of spectral dominance. The selected components are always retained, while the dropout step discards or retains the unselected ones in a probabilistic manner. Due to the randomness involved in dropout, the whole process may be interpreted as a data augmentation method that perturbs the data by creating new data instances from the existing ones on the fly. We evaluated the method on the Aurora-4 corpus just using the clean training data set, and we got relative word error rate reductions between 22% and 25%.

Cite

CITATION STYLE

APA

Tóth, L., Kovács, G., & Van Compernolle, D. (2018). A Perceptually Inspired Data Augmentation Method for Noise Robust CNN Acoustic Models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11096 LNAI, pp. 697–706). Springer Verlag. https://doi.org/10.1007/978-3-319-99579-3_71

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free