Combining statistical machine learning models to extract keywords from Chinese documents

Chengzhi Zhang

Conference Proceedings

Combining statistical machine learning models to extract keywords from Chinese documents

Zhang C

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5678 LNAI 745-754

DOI: 10.1007/978-3-642-03348-3_79

2Citations

9Readers

Get full text

Abstract

Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text mining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. However, most methods of automatic keyword extraction cannot use the features of documents effectively. A method which integrates the statistical machine learning models is proposed in this paper. This method extracts keyword from Chinese documents through voting of multiple keywords extraction models. Experimental results show that the proposed method based on ensemble leaning outperforms other methods according to F 1 measurement. Moreover, the keywords extraction model based on ensemble learning with the weighted voting outperforms the model without the weighted voting. © 2009 Springer.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, C. (2009). Combining statistical machine learning models to extract keywords from Chinese documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5678 LNAI, pp. 745–754). https://doi.org/10.1007/978-3-642-03348-3_79

Combining statistical machine learning models to extract keywords from Chinese documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions