Exploring word embeddings in CRF-based keyphrase extraction from research papers

Krutarth Patel; Cornelia Caragea

Conference Proceedings

Exploring word embeddings in CRF-based keyphrase extraction from research papers

K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (2019) 37-44

DOI: 10.1145/3360901.3364447

20Citations

21Readers

Get full text

Abstract

Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections. However, keyphrases are not always provided with the papers, but they need to be extracted from their content. In this paper, we explore keyphrase extraction formulated as sequence labeling and utilize the power of Conditional Random Fields in capturing label dependencies through a transition parameter matrix consisting of the transition probabilities from one label to the neighboring label. We aim at identifying the features that, by themselves or in combination with others, perform well in extracting the descriptive keyphrases for a paper. Specifically, we explore word embeddings as features along with traditional, document-specific features for keyphrase extraction. Our results on five datasets of research papers show that the word embeddings combined with document specific features achieve high performance and outperform strong baselines for this task.

Author supplied keywords

Cite

CITATION STYLE

APA

Patel, K., & Caragea, C. (2019). Exploring word embeddings in CRF-based keyphrase extraction from research papers. In K-CAP 2019 - Proceedings of the 10th International Conference on Knowledge Capture (pp. 37–44). Association for Computing Machinery, Inc. https://doi.org/10.1145/3360901.3364447

Exploring word embeddings in CRF-based keyphrase extraction from research papers

Abstract

Author supplied keywords

Cite

Register to see more suggestions