Improvement of textrank based on co-occurrence word pairs and context information

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

TextRank, a widely used keyword extraction algorithm, considers the relationship between words based on the graph model. However, Words with high frequency have more opportunities to co-occur with other words. Extracting keywords based on co-occurrence relationships ignores some unrecognized words, and TextRank only constructs a graph model from a single document. It leads to less efficiency in some related documents for missing the context information in the documents collection. In this paper, A smart improvement algorithm for TextRank is promoted. Firstly, for introducing external document features and considering the relationship between documents, all co-occurrence word pairs from the documents collection are extracted by associate rule mining. Then the co-occurrence frequency in TextRank score formula is replaced with the mutual information between the co-occurrence word pairs, which considers some less co-occurrence word pairs. Moreover, the context entropy of the words in the collection are calculated. At last, a new TextRank score formula is constructed, in which the context entropy pluses the replaced score formula with different weights. For testing the effectiveness, an experiment, considering five scoring weights combination, compares the improvement algorithm with the original TextRank and TF-IDF based on two different type of datasets (a public Chinese dataset and a financial dataset crawled from the internet). The experiment results show that with the same weight of the two parts, the improved TextRank algorithm is superior to the others.

Cite

CITATION STYLE

APA

Wang, Y., Yin, H., & He, M. (2018). Improvement of textrank based on co-occurrence word pairs and context information. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11344 LNCS, pp. 226–235). Springer Verlag. https://doi.org/10.1007/978-3-030-05755-8_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free