Selective search: Efficient and effective search of large textual collections

40Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The traditional search solution for large collections divides the collection into subsets (shards), and processes the query against all shards in parallel (exhaustive search). The search cost and the computational requirements of this approach are often prohibitively high for organizations with few computational resources. This article investigates and extends an alternative: selective search, an approach that partitions the dataset based on document similarity to obtain topic-based shards, and searches only a few shards that are estimated to contain relevant documents for the query. We propose shard creation techniques that are scalable, efficient, self-reliant, and create topic-based shards with low variance in size, and high density of relevant documents. The experimental results demonstrate that the effectiveness of selective search is on par with that of exhaustive search, and the corresponding search costs are substantially lower with the former. Also, the majority of the queries perform as well or better with selective search. An oracle experiment that uses optimal shard ranking for a query indicates that selective search can outperform the effectiveness of exhaustive search. Comparison with a query optimization technique shows higher improvements in efficiency with selective search. The overall best efficiencyisachieved when the two techniques are combinedinan optimized selective search approach.

Cite

CITATION STYLE

APA

Kulkarni, A., & Callan, J. (2015). Selective search: Efficient and effective search of large textual collections. ACM Transactions on Information Systems, 33(4). https://doi.org/10.1145/2738035

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free