A new scheme for scoring phrases in unsupervised keyphrase extraction

14Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many unsupervised methods for keyphrase extraction typically compute a score for each word in a document based on various measures such as tf-idf or the PageRank score computed from the word graph built from the text document. The final score of a candidate phrase is then calculated by summing up the scores of its constituent words. A potential problem with the sum up scoring scheme is that the length of a phrase highly impacts its score. To reduce this impact and extract keyphrases of varied lengths, we propose a new scheme for scoring phrases which calculates the final score using the average of the scores of individual words weighted by the frequency of the phrase in the document. We show experimentally that the unsupervised approaches that use this new scheme outperform their counterparts that use the sum up scheme to score phrases.

Cite

CITATION STYLE

APA

Florescu, C., & Caragea, C. (2017). A new scheme for scoring phrases in unsupervised keyphrase extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10193 LNCS, pp. 477–483). Springer Verlag. https://doi.org/10.1007/978-3-319-56608-5_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free