Subword-based tagging by conditional random fields for Chinese word segmentation

50Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.

Abstract

We proposed two approaches to improve Chinese word segmentation: a subword-based tagging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.

Cite

CITATION STYLE

APA

Zhang, R., Kikui, G., & Sumita, E. (2006). Subword-based tagging by conditional random fields for Chinese word segmentation. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Short Papers (pp. 193–196). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1614049.1614098

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free