Chinese word segmentation (CWS) is a fundamental task for Chinese information processing, which always suffers from out-of-vocabulary word issues, especially when it is tested on data from different sources. Although one possible solution is to use more training data, in real applications, these data are stored at different locations and thus are invisible and isolated among each other owing to the privacy or legal issues (e.g., clinical reports from different hospitals). To address this issue and benefit from extra data, we propose a neural model for CWS with federated learning (FL) adopted to help CWS deal with data isolation, where a mechanism of global character associations is proposed to enhance FL to learn from different data sources. Experimental results on a simulated environment with five nodes confirm the effectiveness of our approach, where our approach outperforms different baselines including some well-designed FL frameworks.
CITATION STYLE
Tian, Y., Chen, G., Qin, H., & Song, Y. (2021). Federated Chinese Word Segmentation with Global Character Associations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 4306–4313). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.376
Mendeley helps you to discover research relevant for your work.