Evaluating document-to-document relevance based on document language model: Modeling, implementation and performance evaluation

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To evaluate document-to-document relevance is very important to many advanced applications such as IR, text mining and natural language processing. Since it is very hard to define document relevance in a mathematic way on account of users' uncertainty, the concept of topical relevance is widely accepted by most of research fields. It suggests that a document relevance model should explain whether the document representation describes its topical contents and the matching method reveals the topical differences among the documents. However, the current document-to-document relevance models, such as vector space model, string distance, don't put explicitly emphasis on the perspective of topical relevance. This paper exploits a document language model to represent the document topical content and explains why it can reveal the document topics and then establishes two distributional similarity measure based on the document language model to evaluate document-to-document relevance. The experiment on the TREC testing collection is made to compare it with the vector space model, and the results show that the Kullback-Leibler divergence measure with Jelinek-Mercer smoothing outperforms the vector space model significantly. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Yu, G., Li, X., Bao, Y., & Wang, D. (2005). Evaluating document-to-document relevance based on document language model: Modeling, implementation and performance evaluation. In Lecture Notes in Computer Science (Vol. 3406, pp. 593–603). Springer Verlag. https://doi.org/10.1007/978-3-540-30586-6_63

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free