It is necessary to analyze and mining marketing notification texts because there are various commercial information. The base of the operation is Chinese word segmentation. The speed and accuracy of word segmentation have important influence on the subsequent texts mining. We compared accuracy, recall, and F-value of four open-source Chinese word segmentation tools (Ansj, HanLP, Word and Jieba) on the third-party datasets. Then, we compared the segmentation speed of the four tools on one million marketing notification texts. Finally, we segmented 5, 000 marketing notification texts artificially. We evaluated the performance of these segmentation tools by the results of artificial segmentation, which are known as evaluate standard. The experiments show the Base mode of the Ansj is the fastest. The HanLP is a best segmentation tool for balancing speed and accuracy of word segmentation. After adding a custom dictionary, the effect of word segmentation has been significantly improved.
CITATION STYLE
Zhang, X., Wu, P., Cai, J., & Wang, K. (2019). A Contrastive Study of Chinese Text Segmentation Tools in Marketing Notification Texts. In Journal of Physics: Conference Series (Vol. 1302). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1302/2/022010
Mendeley helps you to discover research relevant for your work.