Word segmentation and POS tagging for Chinese keyphrase extraction

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Keyphrases are essential for many text mining applications. In order to automatically extracting keyphrases from Chinese text, an extraction system is proposed in this paper. To access a particular problem of Chinese information processing, a lexicon-based word segmentation approach is presented. For this purpose, a verb lexicon, a functional word lexicon and a stop word lexicon are constructed. A predefined keyphrase lexicon is applied to improve the performance of extraction. The approach uses a small Part-Of-Speech(POS) tagset to index phrases simply according to these lexicons. It is especially effective for identifying phrases in form of combinations of nouns, adjectives and verbs. Keyphrases are sifted by their weighted TF-IDF (Term occurrence Frequency-Inverse Document Frequency) values. New keyphrases are added into the keyphrase lexicon. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Huang, X., Chen, J., Yan, P., & Luo, X. (2005). Word segmentation and POS tagging for Chinese keyphrase extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3584 LNAI, pp. 364–369). Springer Verlag. https://doi.org/10.1007/11527503_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free