Query-biased partitioning for selective search

17Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

Selective search is a cluster-based distributed retrieval architecture that reduces computational costs by partitioning a corpus into topical shards, and selectively searching them. Prior research formed topical shards by clustering the corpus based on the documents' contents. This content-based partitioning strategy reveals common topics in a corpus. However, the topic distribution produced by clustering may not match the distribution of topics in search traffic, which may reduce the effectiveness of selective search. This paper presents a query-biased partitioning strategy that aligns document partitions with topics from query logs. It focuses on two parts of the partitioning process: clustering initialization and document similarity calculation. A query-driven clustering initialization algorithm uses topics from query logs to form cluster seeds. A query-biased similarity metric favors terms that are important in query logs. Both methods boost retrieval effectiveness, reduce variance, and produce a more balanced distribution of shard sizes.

Cite

CITATION STYLE

APA

Dai, Z., Xiong, C., & Callan, J. (2016). Query-biased partitioning for selective search. In International Conference on Information and Knowledge Management, Proceedings (Vol. 24-28-October-2016, pp. 1119–1128). Association for Computing Machinery. https://doi.org/10.1145/2983323.2983706

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free