Word distribution analysis for relevance ranking and query expansion

Patricio Galeas; Bernd Freisleben

Conference Proceedings

Word distribution analysis for relevance ranking and query expansion

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4919 LNCS 500-511

DOI: 10.1007/978-3-540-78135-6_43

4Citations

3Readers

Get full text

Abstract

Apart from the frequency of terms in a document collection, the distribution of words plays an important role in determining the relevance of documents for a given search query. In this paper, word distribution analysis as a novel approach for using descriptive statistics to calculate a compressed representation of word positions in a document corpus is introduced. Based on this statistical approximation, two methods for improving the evaluation of document relevance are proposed: (a) a relevance ranking procedure based on how query terms are distributed over initially retrieved documents, and (b) a query expansion technique based on overlapping the distributions of terms in the top-ranked documents. Experimental results obtained for the TREC-8 document collection demonstrate that the proposed approach leads to an improvement of about 6.6% over the term frequency/inverse document frequency weighting scheme without applying query reformulation or relevance feedback techniques. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Galeas, P., & Freisleben, B. (2008). Word distribution analysis for relevance ranking and query expansion. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4919 LNCS, pp. 500–511). https://doi.org/10.1007/978-3-540-78135-6_43

Word distribution analysis for relevance ranking and query expansion

Abstract

Cite

Register to see more suggestions