Selective search is a cluster-based distributed retrieval architecture that reduces computational costs by partitioning a corpus into topical shards, and selectively searching them. Prior research formed topical shards by clustering the corpus based on the documents' contents. This content-based partitioning strategy reveals common topics in a corpus. However, the topic distribution produced by clustering may not match the distribution of topics in search traffic, which may reduce the effectiveness of selective search. This paper presents a query-biased partitioning strategy that aligns document partitions with topics from query logs. It focuses on two parts of the partitioning process: clustering initialization and document similarity calculation. A query-driven clustering initialization algorithm uses topics from query logs to form cluster seeds. A query-biased similarity metric favors terms that are important in query logs. Both methods boost retrieval effectiveness, reduce variance, and produce a more balanced distribution of shard sizes.
CITATION STYLE
Dai, Z., Xiong, C., & Callan, J. (2016). Query-biased partitioning for selective search. In International Conference on Information and Knowledge Management, Proceedings (Vol. 24-28-October-2016, pp. 1119–1128). Association for Computing Machinery. https://doi.org/10.1145/2983323.2983706
Mendeley helps you to discover research relevant for your work.