In this paper, we present a weighted graph based method to simultaneously compare the textual content of two or more documents and extract the shared (sub)topics of them, if available. A set of documents are modelled with a set of pairwise weighted bipartite graphs. A generalized mutual reinforcement principle is applied to the pairwise bipartite graphs to calculate the saliency scores of sentences in each documents based on pairwise weighted bipartite graphs. Sentences with advantaged saliency are selected, and they together convey the dominant shared topic. If there are more than one shared subtopics among the documents, a spectral min-max cut algorithm can be used to partition a derived sentence similarity graph into several subgraphs. For a subgraph, if all documents contribute some sentences(nodes) to it, then these sentences(nodes) in the subgraph may convey a shared subtopic. The generalized mutual reinforcement principle is applied to them to verify and extract the shared subtopic.
CITATION STYLE
Ji, X., & Zha, H. (2003). Extracting shared topics of multiple documents. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2637, pp. 100–110). Springer Verlag. https://doi.org/10.1007/3-540-36175-8_10
Mendeley helps you to discover research relevant for your work.