Gaussian Mixture Models (GMM) has been the most common used models in pronunciation verification systems. The recently introduced Deep Neural Networks (DNN) has proved to provide significantly better discriminative models of the acoustic space. In this paper, we introduce our efforts to upgrade the models of a Computer Aided Language Learner (CAPL) system that is used to teach the Arabic pronunciation for Quran recitation rules. Four major enhancements were introduced, firstly we used SAT to reduce the inter-speakers variability, secondly, we integrated a hybrid DNN-HMM models to enhance the acoustic model and decrease the phone error rate. Third, we integrated Minimum Phone Error (MPE) with the hybrid DNN. Finally, in the testing phase, we used a grammar-based decoding graph to limit the search space to the frequent errors types. A comparison between the performance of the conventional GMM-HMM and the hybrid DNN-HMM was performed with results showing significant performance improvements.
CITATION STYLE
Elaraby, M. S., Abdallah, M., Abdou, S., & Rashwan, M. (2016). A deep neural networks (DNN) based models for a computer aided pronunciation learning system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9811 LNCS, pp. 51–58). Springer Verlag. https://doi.org/10.1007/978-3-319-43958-7_5
Mendeley helps you to discover research relevant for your work.