Domain-specific Chinese word segmentation with document-level optimization

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Previous studies normally formulate Chinese word segmentation as a character sequence labeling task and optimize the solution in sentence-level. In this paper, we address Chinese word segmentation as a document-level optimization problem. First, we apply a state-of-the-art approach, i.e., long short-term memory (LSTM), to perform character classification; Then, we propose a global objective function on the basis of character classification and achieve global optimization via Integer Linear Programming (ILP). Specifically, we propose several kinds of global constrains in ILP to capture various segmentation knowledge, such as segmentation consistency and domain-specific regulations, to achieve document-level optimization, besides label transition knowledge to achieve sentence-level optimization. Empirical studies demonstrate the effectiveness of the proposed approach to domain-specific Chinese word segmentation.

Cite

CITATION STYLE

APA

Yan, Q., Shen, C., Li, S., Xia, F., & Du, Z. (2018). Domain-specific Chinese word segmentation with document-level optimization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10619 LNAI, pp. 353–365). Springer Verlag. https://doi.org/10.1007/978-3-319-73618-1_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free