A joint model for unsupervised Chinese word segmentation

Miaohong Chen; Baobao Chang; Wenzhe Pei

Conference Proceedings

A joint model for unsupervised Chinese word segmentation

EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014) 854-863

DOI: 10.3115/v1/d14-1092

20Citations

95Readers

Get full text

Abstract

In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired by the "products of experts" idea, our joint model firstly combines two generative models, which are word-based hierarchical Dirichlet process model and character-based hidden Markov model, by simply multiplying their probabilities together. Gibbs sampling is used for model inference. In order to further combine the strength of goodness-based model, we then integrated nVBE into our joint model by using it to initializing the Gibbs sampler. We conduct our experiments on PKU and MSRA datasets provided by the second SIGHAN bakeoff. Test results on these two datasets show that the joint model achieves much better results than all of its component models. Statistical significance tests also show that it is significantly better than stateof- The-art systems, achieving the highest F-scores. Finally, analysis indicates that compared with nVBE and HDP, the joint model has a stronger ability to solve both combinational and overlapping ambiguities in Chinese word segmentation.,.

Cite

CITATION STYLE

APA

Chen, M., Chang, B., & Pei, W. (2014). A joint model for unsupervised Chinese word segmentation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 854–863). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1092

A joint model for unsupervised Chinese word segmentation

Abstract

Cite

Register to see more suggestions