An analysis of deep neural networks in broad phonetic classes for noisy speech recognition

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The introduction of Deep Neural Network (DNN) based acoustic models has produced dramatic improvements in performance. In particular, we have recently found that Deep Maxout Networks, a modification of DNNs’ feed-forward architecture that uses a max-out activation function, provides enhanced robustness to environmental noise. In this paper we further investigate how these improvements are translated into the different broad phonetic classes and how does it compare to classical Hidden Markov Models (HMM) based back-ends. Our experiments demonstrate that performance is still tightly related to the particular phonetic class being stops and affricates the least resilient but also that relative improvements of both DNN variants are distributed unevenly across those classes having the type of noise a significant influence on the distribution. A combination of the different systems DNN and classical HMM is also proposed to validate our hypothesis that the traditional GMM/HMM systems have a different type of error than the Deep Neural Networks hybrid models.

Cite

CITATION STYLE

APA

De-La-Calle-Silos, F., Gallardo-Antolín, A., & Peláez-Moreno, C. (2016). An analysis of deep neural networks in broad phonetic classes for noisy speech recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10077 LNAI, pp. 87–96). Springer Verlag. https://doi.org/10.1007/978-3-319-49169-1_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free