In this paper, we give an overview for the shared task at the 5th CCF Conference on Natural Language Processing & Chinese Computing (NLPCC 2016): Chinese word segmentation for micro-blog texts. Different with the popular used newswire datasets, the dataset of this shared task consists of the relatively informal micro-texts. Besides, we also use a new psychometric-inspired evaluation metric for Chinese word segmentation, which addresses to balance the very skewed word distribution at different levels of difficulty. The data and evaluation codes can be downloaded from https://github.com/FudanNLP/ NLPCC-WordSeg-Weibo.
CITATION STYLE
Qiu, X., Qian, P., & Shi, Z. (2016). Overview of the NLPCC-ICCPOL 2016 shared task: Chinese word segmentation for micro-blog texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10102, pp. 901–906). Springer Verlag. https://doi.org/10.1007/978-3-319-50496-4_84
Mendeley helps you to discover research relevant for your work.