This paper presents a Viterbi approximation of latent words language models (LWLMs) for automatic speech recognition (ASR). The LWLMs are effective against data sparseness because of their soft-decision clustering structure and Bayesian modeling, so LWLMs can perform robustly in multiple ASR tasks. Unfortunately, implementing an LWLM to ASR is difficult because of its computation complexity. In our previous work, we implemented an n-gram approximation of LWLM for ASR by sampling words according to a stochastic process and training word n-gram LMs. However, the previous approach cannot take into account a latent word sequence behind a recognition hypothesis. Our solution is the Viterbi approximation that simultaneously decodes both the recognition hypothesis and the latent word sequence. The Viterbi approximation is implemented as a two-pass ASR decoding in which the latent word sequence is estimated from a decoded recognition hypothesis using Gibbs sampling. Experiments show the effectiveness of the Viterbi approximation in an n-best rescoring framework. In addition, we investigate the relationship of the n-gram approximation and the Viterbi approximation.
CITATION STYLE
Masumura, R., Asami, T., Oba, T., Masataki, H., & Sakauchi, S. (2019). Viterbi approximation of latent words language models for automatic speech recognition. Journal of Information Processing, 27, 168–176. https://doi.org/10.2197/ipsjjip.27.168
Mendeley helps you to discover research relevant for your work.