Subword-based tagging for confidence-dependent Chinese word segmentation

Ruiqiang Zhang; Genichiro Kikui; Eiichiro Sumita

Conference Proceedings

Subword-based tagging for confidence-dependent Chinese word segmentation

COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (2006) 961-968

DOI: 10.3115/1273073.1273196

24Citations

89Readers

Get full text

Abstract

We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found that the proposed subword-based tagging outperformed the character-based tagging in all comparative experiments. In addition, we proposed a confidence measure approach to combine the results of a dictionary-based and a subword-tagging-based segmentation. This approach can produce an ideal tradeoff between the in-vocaulary rate and out-of-vocabulary rate. Our techniques were evaluated using the test data from Sighan Bakeoff 2005. We achieved higher F-scores than the best results in three of the four corpora: PKU(0.951), CITYU(0.950) and MSR(0.971).

Cite

CITATION STYLE

APA

Zhang, R., Kikui, G., & Sumita, E. (2006). Subword-based tagging for confidence-dependent Chinese word segmentation. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Main Conference Poster Sessions (pp. 961–968). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1273073.1273196

Subword-based tagging for confidence-dependent Chinese word segmentation

Abstract

Cite

Register to see more suggestions