Joint probability consistent relation analysis for document representation

Yang Wei; Jinmao Wei; Zhenglu Yang; Yu Liu

Conference Proceedings

Joint probability consistent relation analysis for document representation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9642 517-532

DOI: 10.1007/978-3-319-32025-0_32

2Citations

6Readers

Get full text

Abstract

Measuring the semantic similarities between documents is an important issue because it is the basis for many applications, such as document summarization, web search, text analysis, and so forth. Although many studies have explored this problem through enriching the document vectors based on the relatedness of the words involved, the performance is still far from satisfaction because of the insufficiency of data, i.e., the sparse and anomalous co-occurrences between words. The insufficient data can only generate unreliable relatedness between words. In this paper, we propose an effective approach to correct the unreliable relatedness, which keeps the joint probabilities of the co-occurrences between each word and themselves consistently equal to their occurrence probabilities throughout the generation of the relatedness. Hence the unreliable relatedness is effectively corrected by referring to the occurrence frequencies of the words, which is confirmed theoretically and experimentally. The thorough evaluation conducted on real datasets illustrates that significant improvement has been achieved on document clustering compared with the state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Wei, Y., Wei, J., Yang, Z., & Liu, Y. (2016). Joint probability consistent relation analysis for document representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9642, pp. 517–532). Springer Verlag. https://doi.org/10.1007/978-3-319-32025-0_32

Joint probability consistent relation analysis for document representation

Abstract

Author supplied keywords

Cite

Register to see more suggestions