Federated Chinese Word Segmentation with Global Character Associations

Yuanhe Tian; Guimin Chen; Han Qin; Yan Song

Conference ProceedingsOPEN ACCESS

Federated Chinese Word Segmentation with Global Character Associations

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021) 4306-4313

DOI: 10.18653/v1/2021.findings-acl.376

7Citations

51Readers

Abstract

Chinese word segmentation (CWS) is a fundamental task for Chinese information processing, which always suffers from out-of-vocabulary word issues, especially when it is tested on data from different sources. Although one possible solution is to use more training data, in real applications, these data are stored at different locations and thus are invisible and isolated among each other owing to the privacy or legal issues (e.g., clinical reports from different hospitals). To address this issue and benefit from extra data, we propose a neural model for CWS with federated learning (FL) adopted to help CWS deal with data isolation, where a mechanism of global character associations is proposed to enhance FL to learn from different data sources. Experimental results on a simulated environment with five nodes confirm the effectiveness of our approach, where our approach outperforms different baselines including some well-designed FL frameworks.

Cite

CITATION STYLE

APA

Tian, Y., Chen, G., Qin, H., & Song, Y. (2021). Federated Chinese Word Segmentation with Global Character Associations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4306–4313). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.376

Federated Chinese Word Segmentation with Global Character Associations

Abstract

Cite

Register to see more suggestions