Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Nils Witt; Tobias Milz; Christin Seifert

Conference Proceedings

Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11198 LNAI 373-385

DOI: 10.1007/978-3-030-01771-2_24

1Citations

5Readers

Get full text

Abstract

Automatic keyphrase extraction attempts to capture keywords that accurately and extensively describe the document while being comprehensive at the same time. Unsupervised algorithms for extractive keyphrase extraction, i.e. those that filter the keyphrases from the text without external knowledge, generally suffer from low precision and low recall. In this paper, we propose a scoring of the extracted keyphrases as post-processing to rerank the list of extracted phrases in order to improve precision and recall particularly for the top phrases. The approach is based on the tf-idf score of the keyphrases and is agnostic of the underlying method used for the initial extraction of the keyphrases. Experiments show an increase of up to 14% at 5 keyphrases in the F1-metric on the most difficult corpus out of 4 corpora. We also show that this increase is mostly due to an increase on documents with very low F1-scores. Thus, our scoring and aggregation approach seems to be a promising way for robust, unsupervised keyphrase extraction with a special focus on the most important keyphrases.

Cite

CITATION STYLE

APA

Witt, N., Milz, T., & Seifert, C. (2018). Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11198 LNAI, pp. 373–385). Springer Verlag. https://doi.org/10.1007/978-3-030-01771-2_24

Most Important First – Keyphrase Scoring for Improved Ranking in Settings With Limited Keyphrases

Abstract

Cite

Register to see more suggestions