Federated Chinese Word Segmentation with Global Character Associations

7Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

Chinese word segmentation (CWS) is a fundamental task for Chinese information processing, which always suffers from out-of-vocabulary word issues, especially when it is tested on data from different sources. Although one possible solution is to use more training data, in real applications, these data are stored at different locations and thus are invisible and isolated among each other owing to the privacy or legal issues (e.g., clinical reports from different hospitals). To address this issue and benefit from extra data, we propose a neural model for CWS with federated learning (FL) adopted to help CWS deal with data isolation, where a mechanism of global character associations is proposed to enhance FL to learn from different data sources. Experimental results on a simulated environment with five nodes confirm the effectiveness of our approach, where our approach outperforms different baselines including some well-designed FL frameworks.

Cite

CITATION STYLE

APA

Tian, Y., Chen, G., Qin, H., & Song, Y. (2021). Federated Chinese Word Segmentation with Global Character Associations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4306–4313). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.376

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free