Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

Xie Sun; Yunxin Zhao

Journal ArticleOPEN ACCESS

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

Eurasip Journal on Audio, Speech, and Music Processing (2014) 2014

DOI: 10.1186/1687-4722-2014-4

N/ACitations

9Readers

Abstract

We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segments (triphone context) and use multiple Gaussian mixture model (GMM) indices to represent each frame of speech templates. We investigate two different local distances, log likelihood ratio (LLR) and Kullback-Leibler (KL) divergence, for dynamic time warping (DTW)-based template matching. In order to reduce computation and storage complexities, we also propose two methods for template selection: minimum distance template selection (MDTS) and maximum likelihood template selection (MLTS). We further propose to fine tune the MLTS template representatives by using a GMM merging algorithm so that the GMMs can better represent the frames of the selected template representatives. Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition (LVCSR) task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling (HMM) baselines for both TIMIT and telehealth tasks. The template selection methods also provided significant accuracy gains over the HMM baseline while largely reducing the computation and storage complexities. When all templates or MDTS were used, using the LLR local distance gave better performance than the KL local distance. For MLTS and template compression, KL local distance gave better performance than the LLR local distance, and template compression further improved the recognition accuracy on top of MLTS while having less computational cost. © 2014 Sun and Zhao; licensee Springer.

Author supplied keywords

Cite

CITATION STYLE

APA

Sun, X., & Zhao, Y. (2014). Integrated exemplar-based template matching and statistical modeling for continuous speech recognition. Eurasip Journal on Audio, Speech, and Music Processing, 2014. https://doi.org/10.1186/1687-4722-2014-4

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 4

50%

Researcher 4

50%

Readers' Discipline

Computer Science 4

50%

Engineering 2

25%

Arts and Humanities 1

13%

Environmental Science 1

13%

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

Abstract

Author supplied keywords

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline