Approach to Chinese word segmentation based on character-word joint decoding

Yan Song; Dong Feng Cai; Gui Ping Zhang; Hai Zhao

Journal ArticleOPEN ACCESS

Approach to Chinese word segmentation based on character-word joint decoding

Ruan Jian Xue Bao/Journal of Software (2009) 20(9) 2366-2375

DOI: 10.3724/SP.J.1001.2009.03606

14Citations

6Readers

Abstract

The performance of Chinese word segmentation has been greatly improved by character-based approaches in recent years. With the help of powerful machine learning strategies, the words extraction via combination of characters becomes the focus in Chinese word segmentation researches. In spite of the outstanding capability of discovering out-of-vocabulary words, the character-based approaches are not as good as word-based approaches in in-vocabulary words segmentation with some internal and external information of the words lost. In this paper we propose a joint decoding strategy that combines the character-based conditional random field model and word-based Bi-gram language model, for segmenting Chinese character sequences. The experimental results demonstrate the good performance of our approach, and prove that two sub models are well integrated as the joint model of character and word could more effectively enhance the performance of Chinese word segmentation systems than any of the single model, thus is fit for many applications in Chinese information processing. © by Institute of Software, the Chinese Academy of Sciences. All rights reserved.

Author supplied keywords

Cite

CITATION STYLE

APA

Song, Y., Cai, D. F., Zhang, G. P., & Zhao, H. (2009). Approach to Chinese word segmentation based on character-word joint decoding. Ruan Jian Xue Bao/Journal of Software, 20(9), 2366–2375. https://doi.org/10.3724/SP.J.1001.2009.03606

Approach to Chinese word segmentation based on character-word joint decoding

Abstract

Author supplied keywords

Cite

Register to see more suggestions