A new scheme for scoring phrases in unsupervised keyphrase extraction

Corina Florescu; Cornelia Caragea

Conference Proceedings

A new scheme for scoring phrases in unsupervised keyphrase extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10193 LNCS 477-483

DOI: 10.1007/978-3-319-56608-5_37

14Citations

19Readers

Get full text

Abstract

Many unsupervised methods for keyphrase extraction typically compute a score for each word in a document based on various measures such as tf-idf or the PageRank score computed from the word graph built from the text document. The final score of a candidate phrase is then calculated by summing up the scores of its constituent words. A potential problem with the sum up scoring scheme is that the length of a phrase highly impacts its score. To reduce this impact and extract keyphrases of varied lengths, we propose a new scheme for scoring phrases which calculates the final score using the average of the scores of individual words weighted by the frequency of the phrase in the document. We show experimentally that the unsupervised approaches that use this new scheme outperform their counterparts that use the sum up scheme to score phrases.

Cite

CITATION STYLE

APA

Florescu, C., & Caragea, C. (2017). A new scheme for scoring phrases in unsupervised keyphrase extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10193 LNCS, pp. 477–483). Springer Verlag. https://doi.org/10.1007/978-3-319-56608-5_37

A new scheme for scoring phrases in unsupervised keyphrase extraction

Abstract

Cite

Register to see more suggestions