Synthetic word parsing improves Chinese word segmentation

5Citations
Citations of this article
96Readers
Mendeley users who have this article in their library.

Abstract

We present a novel solution to improve the performance of Chinese word seg-mentation (CWS) using a synthetic word parser. The parser analyses the inter-nal structure of words, and attempts to convert out-of-vocabulary words (OOVs) into in-vocabulary fine-grained sub-words. We propose a pipeline CWS system that first predicts this fine-grained segmenta-tion, then chunks the output to recon-struct the original word segmentation stan-dard. We achieve competitive results on the PKU and MSR datasets, with substan-tial improvements in OOV recall.

Cite

CITATION STYLE

APA

Cheng, F., Duh, K., & Matsumoto, Y. (2015). Synthetic word parsing improves Chinese word segmentation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 262–267). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2043

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free