A concise model for multi-criteria Chinese word segmentation with transformer encoder

18Citations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github1

Cite

CITATION STYLE

APA

Qiu, X., Pei, H., Yan, H., & Huang, X. (2020). A concise model for multi-criteria Chinese word segmentation with transformer encoder. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 2887–2897). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.260

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free