Word distribution analysis for relevance ranking and query expansion

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Apart from the frequency of terms in a document collection, the distribution of words plays an important role in determining the relevance of documents for a given search query. In this paper, word distribution analysis as a novel approach for using descriptive statistics to calculate a compressed representation of word positions in a document corpus is introduced. Based on this statistical approximation, two methods for improving the evaluation of document relevance are proposed: (a) a relevance ranking procedure based on how query terms are distributed over initially retrieved documents, and (b) a query expansion technique based on overlapping the distributions of terms in the top-ranked documents. Experimental results obtained for the TREC-8 document collection demonstrate that the proposed approach leads to an improvement of about 6.6% over the term frequency/inverse document frequency weighting scheme without applying query reformulation or relevance feedback techniques. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Galeas, P., & Freisleben, B. (2008). Word distribution analysis for relevance ranking and query expansion. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4919 LNCS, pp. 500–511). https://doi.org/10.1007/978-3-540-78135-6_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free