Neural networks incorporating unlabeled and partially-labeled data for cross-domain Chinese word segmentation

Lujun Zhao; Qi Zhang; Peng Wang; Xiaoyu Liu

Conference Proceedings

Neural networks incorporating unlabeled and partially-labeled data for cross-domain Chinese word segmentation

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 4602-4608

DOI: 10.24963/ijcai.2018/640

23Citations

20Readers

Get full text

Abstract

Most existing Chinese word segmentation (CWS) methods are usually supervised. Hence, large-scale annotated domain-specific datasets are needed for training. In this paper, we seek to address the problem of CWS for the resource-poor domains that lack annotated data. A novel neural network model is proposed to incorporate unlabeled and partially-labeled data. To make use of unlabeled data, we combine a bidirectional LSTM segmentation model with two character-level language models using a gate mechanism. These language models can capture co-occurrence information. To make use of partially-labeled data, we modify the original cross entropy loss function of RNN. Experimental results demonstrate that the method performs well on CWS tasks in a series of domains.

Cite

CITATION STYLE

APA

Zhao, L., Zhang, Q., Wang, P., & Liu, X. (2018). Neural networks incorporating unlabeled and partially-labeled data for cross-domain Chinese word segmentation. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 4602–4608). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/640

Neural networks incorporating unlabeled and partially-labeled data for cross-domain Chinese word segmentation

Abstract

Cite

Register to see more suggestions