Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segments (triphone context) and use multiple Gaussian mixture model (GMM) indices to represent each frame of speech templates. We investigate two different local distances, log likelihood ratio (LLR) and Kullback-Leibler (KL) divergence, for dynamic time warping (DTW)-based template matching. In order to reduce computation and storage complexities, we also propose two methods for template selection: minimum distance template selection (MDTS) and maximum likelihood template selection (MLTS). We further propose to fine tune the MLTS template representatives by using a GMM merging algorithm so that the GMMs can better represent the frames of the selected template representatives. Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition (LVCSR) task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling (HMM) baselines for both TIMIT and telehealth tasks. The template selection methods also provided significant accuracy gains over the HMM baseline while largely reducing the computation and storage complexities. When all templates or MDTS were used, using the LLR local distance gave better performance than the KL local distance. For MLTS and template compression, KL local distance gave better performance than the LLR local distance, and template compression further improved the recognition accuracy on top of MLTS while having less computational cost. © 2014 Sun and Zhao; licensee Springer.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Sun, X., & Zhao, Y. (2014). Integrated exemplar-based template matching and statistical modeling for continuous speech recognition. Eurasip Journal on Audio, Speech, and Music Processing, 2014. https://doi.org/10.1186/1687-4722-2014-4

Readers over time

‘14‘15‘16‘19‘20‘21‘22‘2400.751.52.253

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

50%

Researcher 4

50%

Readers' Discipline

Tooltip

Computer Science 4

50%

Engineering 2

25%

Arts and Humanities 1

13%

Environmental Science 1

13%

Save time finding and organizing research with Mendeley

Sign up for free
0