We proposed two approaches to improve Chinese word segmentation: a subword-based tagging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.
CITATION STYLE
Zhang, R., Kikui, G., & Sumita, E. (2006). Subword-based tagging by conditional random fields for Chinese word segmentation. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers (pp. 193–196). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614049.1614098
Mendeley helps you to discover research relevant for your work.