A hybrid approach of text segmentation based on sensitive word concept for NLP

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Natural language processing, such as Checking and Correction of Texts, Machine Translation, and Information Retrieval, usually starts from words. The identification of words in Indo-European languages is a trivial task. However, this problem named text segmentation has been, and is still a bottleneck for various Asian languages, such as Chinese. There have been two main groups of approaches to Chinese segmentation: dictionary-based approaches and statistical approaches. However, both approaches have difficulty to deal with some Chinese text. To address the difficulties, we propose a hybrid approach using Sensitive Word Concept to Chinese text segmentation. Sensitive words are the compound words whose syntactic category is different from those of their components. According to the segmentation, a sensitive word may play different roles, leading to significantly different syntactic structures. In this paper, we explain the concept of sensitive words and their efficacy in text segmentation firstly, then describe the hybrid approach that combines the rule-based method and the probability-based method using the concept of sensitive words. Our experimental results showed that the presented approach is able to address the text segmentation problems effectively.

Cite

CITATION STYLE

APA

Ren, F. (2001). A hybrid approach of text segmentation based on sensitive word concept for NLP. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2004, pp. 375–388). Springer Verlag. https://doi.org/10.1007/3-540-44686-9_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free