Abstract
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired by the "products of experts" idea, our joint model firstly combines two generative models, which are word-based hierarchical Dirichlet process model and character-based hidden Markov model, by simply multiplying their probabilities together. Gibbs sampling is used for model inference. In order to further combine the strength of goodness-based model, we then integrated nVBE into our joint model by using it to initializing the Gibbs sampler. We conduct our experiments on PKU and MSRA datasets provided by the second SIGHAN bakeoff. Test results on these two datasets show that the joint model achieves much better results than all of its component models. Statistical significance tests also show that it is significantly better than stateof- The-art systems, achieving the highest F-scores. Finally, analysis indicates that compared with nVBE and HDP, the joint model has a stronger ability to solve both combinational and overlapping ambiguities in Chinese word segmentation.,.
Cite
CITATION STYLE
Chen, M., Chang, B., & Pei, W. (2014). A joint model for unsupervised Chinese word segmentation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 854–863). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1092
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.