Dealing with noise deteriorating the speech is still a major problem for automatic speech recognition. An interesting approach to tackle this problem consists of using multi-task learning. In this case, an efficient auxiliary task is clean-speech generation. This auxiliary task is trained in addition to the main speech recognition task and its goal is to help improve the results of the main task. In this paper, we investigate this idea further by generating features extracted directly from the audio file containing only the noise, instead of the clean-speech. After demonstrating that an improvement can be obtained through this multi-task learning auxiliary task, we also show that using both noise and clean-speech estimation auxiliary tasks leads to a 4% relative word error rate improvement in comparison to the classic single-task learning on the CHiME4 dataset.
CITATION STYLE
Pironkov, G., Dupont, S., Wood, S. U. N., & Dutoit, T. (2017). Noise and speech estimation as auxiliary tasks for robust speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10583 LNAI, pp. 181–192). Springer Verlag. https://doi.org/10.1007/978-3-319-68456-7_15
Mendeley helps you to discover research relevant for your work.