In this paper, we have proposed a novel word embedding-based biomedical text summarizer. Biomedical words are represented by real dense vectors. Sentences are represented by summing-up the word vectors that contain. The PageRank algorithm is applied to rank sentences using the cosine similarity as a distance measure between sentences vectors. The top N highly ranked sentences are selected to build the summary. For the evaluation, we created a corpus of 200 biomedical papers downloaded from the Biomed Central full-text database. We used a pre-trained Word2vec model of word vectors generated from a combination of PubMed, PMC, and recent English Wikipedia dump texts. We compared our method with four other summarizers using: ROUGE-1, ROUGE-2, ROUGE-3, and ROUGE-SU4 metrics by evaluating the generated summaries with the abstracts of papers. Our summarizer achieved an improvement of 3.48%, 7.68%, 9.76%, and 3.47% respectively against the second-ranked summarizer.
CITATION STYLE
Rouane, O., Belhadef, H., & Bouakkaz, M. (2020). Word embedding-based biomedical text summarization. In Advances in Intelligent Systems and Computing (Vol. 1073, pp. 288–297). Springer. https://doi.org/10.1007/978-3-030-33582-3_28
Mendeley helps you to discover research relevant for your work.