The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. With the help of powerful machine learning strategies, the words extraction via combination of characters becomes the focus in Chinese word segmentation researches. In spite of the outstanding capability of discovering out-of-vocabulary words, the character-based approaches are not as good as word-based approaches in in-vocabulary words segmentation with some internal and external information of the words lost. In this paper we propose a joint decoding strategy that combines the character-based conditional random field model and word-based Bi-gram language model, for segmenting Chinese character sequences. The experimental results demonstrate the good performance of our approach, and prove that two sub models are well integrated as the joint model of character and word could more effectively enhance the performance of Chinese word segmentation systems than any of the single model, thus is fit for many applications in Chinese information processing. © by Institute of Software, the Chinese Academy of Sciences. All rights reserved.
CITATION STYLE
Song, Y., Cai, D. F., Zhang, G. P., & Zhao, H. (2009). Approach to Chinese word segmentation based on character-word joint decoding. Ruan Jian Xue Bao/Journal of Software, 20(9), 2366–2375. https://doi.org/10.3724/SP.J.1001.2009.03606
Mendeley helps you to discover research relevant for your work.