Unsupervised topic-oriented keyphrase extraction and its application to Croatian

Josip Saratlija; Jan Šnajder; Bojana Dalbelo Bašić

Conference Proceedings

Unsupervised topic-oriented keyphrase extraction and its application to Croatian

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6836 LNAI 340-347

DOI: 10.1007/978-3-642-23538-2_43

10Citations

2Readers

Get full text

Abstract

Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1 = 44.5%, which is outperformed by human annotators, but comparable to a supervised approach. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Saratlija, J., Šnajder, J., & Dalbelo Bašić, B. (2011). Unsupervised topic-oriented keyphrase extraction and its application to Croatian. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6836 LNAI, pp. 340–347). https://doi.org/10.1007/978-3-642-23538-2_43

Unsupervised topic-oriented keyphrase extraction and its application to Croatian

Abstract

Author supplied keywords

Cite

Register to see more suggestions