A new word language model evaluation metric for character based languages

12Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Perplexity is a widely used measure to evaluate word prediction power of a word-based language model. It can be computed independently and has shown good correlation with word error rate (WER) in speech recognition. However, for character based languages, character error rate (CER) is commonly used instead of WER as the measure for speech recognition, although language model is still word based. Due to the fact that different word segmentation strategies may result in different word vocabulary for the same text corpus, in many cases, word-based perplexity is incompetent to evaluate the combined effect of word segmentation and language model training to predict final CER. In this paper, a new word-based language model evaluation measure is proposed to account for the effect of word segmentation and the goal of predicting CER. Experiments were conducted on Chinese speech recognition. Compared to the traditional word-based perplexity, the new measure is more robust to word segmentation and shows much more consistent correlation with CER in a large vocabulary continuous Chinese speech recognition task. © Springer-Verlag 2013.

Cite

CITATION STYLE

APA

Wang, P., Sun, R., Zhao, H., & Yu, K. (2013). A new word language model evaluation metric for character based languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8202 LNAI, pp. 315–324). https://doi.org/10.1007/978-3-642-41491-6_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free