Extracting shared topics of multiple documents

Xiang Ji; Hongyuan Zha

Conference Proceedings

Extracting shared topics of multiple documents

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2003) 2637 100-110

DOI: 10.1007/3-540-36175-8_10

N/ACitations

5Readers

Get full text

Abstract

In this paper, we present a weighted graph based method to simultaneously compare the textual content of two or more documents and extract the shared (sub)topics of them, if available. A set of documents are modelled with a set of pairwise weighted bipartite graphs. A generalized mutual reinforcement principle is applied to the pairwise bipartite graphs to calculate the saliency scores of sentences in each documents based on pairwise weighted bipartite graphs. Sentences with advantaged saliency are selected, and they together convey the dominant shared topic. If there are more than one shared subtopics among the documents, a spectral min-max cut algorithm can be used to partition a derived sentence similarity graph into several subgraphs. For a subgraph, if all documents contribute some sentences(nodes) to it, then these sentences(nodes) in the subgraph may convey a shared subtopic. The generalized mutual reinforcement principle is applied to them to verify and extract the shared subtopic.

Cite

CITATION STYLE

APA

Ji, X., & Zha, H. (2003). Extracting shared topics of multiple documents. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2637, pp. 100–110). Springer Verlag. https://doi.org/10.1007/3-540-36175-8_10

Extracting shared topics of multiple documents

Abstract

Cite

Register to see more suggestions