A segment-based hidden markov model for real-setting pinyin-to-Chinese conversion

5Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Hidden markov model (HMM) is frequently used for Pinyin-to-Chinese conversion. But it only captures the dependency with the preceding character. Higher order markov models can bring higher accuracy, but are computationally unaffordable to average PC settings. We propose a segment-based hidden markov model (SHMM), which has the same magnitude of complexity as first- order HMM, but generates higher decoding accuracy. SHMM tells a word from a bigram connecting two words, and assigns a reasonable probability to words as a whole. It is more powerful than HMM to decode words containing over two characters. We conduct a comprehensive Pinyin-to-Chinese conversion evaluation on Lancaster corpus. The experiment shows the perfect sentence accuracy is improved from 34.7% (HMM) to 43.3% (SHMM). The one-error sentence accuracy is increased from 72.7% to 78.3%. Furthermore, SHMM can seamlessly integrate with pinyin typing correction, acronym pinyin input, user-defined words, and self- adaptive learning all of which are a must for a commercial Pinyin- to-Chinese conversion product in order to improve the efficiency of pinyin input. Copyright 2007 ACM.

Cite

CITATION STYLE

APA

Zhou, X., Hu, X., Zhang, X., & Shen, X. (2007). A segment-based hidden markov model for real-setting pinyin-to-Chinese conversion. In International Conference on Information and Knowledge Management, Proceedings (pp. 1027–1030). https://doi.org/10.1145/1321440.1321602

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free