NamedKeys: Unsupervised keyphrase extraction for biomedical documents

Zelalem Gero; Joyce C. Ho

Conference ProceedingsOPEN ACCESS

NamedKeys: Unsupervised keyphrase extraction for biomedical documents

ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (2019) 328-337

DOI: 10.1145/3307339.3342147

12Citations

20Readers

Get full text

Abstract

A vast amount of biomedical literature is generated and digitized every year. As a result is a growing need to develop methods for discovering, accessing, and sharing knowledge from medical literature. Keyphrase extraction is the task of summarizing a text by identifying the key concepts. The keyphrases can be single-word or multi-word linguistic units which can concisely represent a document. Although a variety of models have been proposed for automated keyphrase extraction, the performance is poor in comparison with other natural language processing tasks. The problem is even more daunting for biomedical domain where the text is filled with highly domain-specific terminologies. We propose a new method, NamedKeys, to automatically identify meaningful and informative keyphrases from biomedical text. NamedKeys integrates named entity recognition, phrase embedding, phrase quality scoring, ranking, and clustering to extract author-assigned keywords from biomedical documents. Performance evaluation on PubMed abstracts demonstrates that NamedKeys achieves significant improvements over existing state-of-the-art keyphrase extraction models. Furthermore, we propose the first benchmark dataset for keyphrase extraction from biomedical text.

Author supplied keywords

Cite

CITATION STYLE

APA

Gero, Z., & Ho, J. C. (2019). NamedKeys: Unsupervised keyphrase extraction for biomedical documents. In ACM-BCB 2019 - Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (pp. 328–337). Association for Computing Machinery, Inc. https://doi.org/10.1145/3307339.3342147

NamedKeys: Unsupervised keyphrase extraction for biomedical documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions