A trigram statistical language model algorithm for chinese word segmentation

Jun Mao; Gang Cheng; Yanxiang He; Zehuan Xing

Conference Proceedings

A trigram statistical language model algorithm for chinese word segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4613 LNCS 271-280

DOI: 10.1007/978-3-540-73814-5_26

4Citations

1Readers

Get full text

Abstract

We address the problem of segmenting a Chinese text into words. In this paper, we propose a trigram model algorithm for segmenting a Chinese text. We also discuss why statistical language model is appropriate to be applied to Chinese word segmentation and give an algorithm for segmenting a Chinese text into words. In particular, we solve the problem of searching which often leads to low performance brought by trigram model. Finally, the issue of OOV word identification is discussed and merged to trigram model based method in order to improve the accuracy of segmentation. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Mao, J., Cheng, G., He, Y., & Xing, Z. (2007). A trigram statistical language model algorithm for chinese word segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4613 LNCS, pp. 271–280). Springer Verlag. https://doi.org/10.1007/978-3-540-73814-5_26

A trigram statistical language model algorithm for chinese word segmentation

Abstract

Cite

Register to see more suggestions