Joint probability consistent relation analysis for document representation

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Measuring the semantic similarities between documents is an important issue because it is the basis for many applications, such as document summarization, web search, text analysis, and so forth. Although many studies have explored this problem through enriching the document vectors based on the relatedness of the words involved, the performance is still far from satisfaction because of the insufficiency of data, i.e., the sparse and anomalous co-occurrences between words. The insufficient data can only generate unreliable relatedness between words. In this paper, we propose an effective approach to correct the unreliable relatedness, which keeps the joint probabilities of the co-occurrences between each word and themselves consistently equal to their occurrence probabilities throughout the generation of the relatedness. Hence the unreliable relatedness is effectively corrected by referring to the occurrence frequencies of the words, which is confirmed theoretically and experimentally. The thorough evaluation conducted on real datasets illustrates that significant improvement has been achieved on document clustering compared with the state-of-the-art methods.

Cite

CITATION STYLE

APA

Wei, Y., Wei, J., Yang, Z., & Liu, Y. (2016). Joint probability consistent relation analysis for document representation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9642, pp. 517–532). Springer Verlag. https://doi.org/10.1007/978-3-319-32025-0_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free