The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms. These were designed as middleware algorithms that assume that similarity lists for each attribute are available and focus on efficiently merging these lists to arrive at the results. In this paper, we explore multi-dimensional indexing of non-metric spaces that can lead to efficient pruning of the search space utilizing inter-attribute relationships, during top-k computation. We propose an indexing structure, the AL-Tree and an algorithm to do top-k retrieval using it in an online fashion. The ALTree exploits the fact that many real world attributes come from a small value space. We show that our algorithm performs much better than the threshold based algorithms in terms of computational cost due to efficient pruning of the search space. Further, it outperforms them in terms of IOs by upto an order of magnitude in case of dense datasets. Copyright 2008 ACM.
CITATION STYLE
Deshpande, P. M., Deepak, P., & Kummamuru, K. (2008). Efficient online top-k retrieval with arbitrary similarity measures. In Advances in Database Technology - EDBT 2008 - 11th International Conference on Extending Database Technology, Proceedings (pp. 356–367). https://doi.org/10.1145/1353343.1353388
Mendeley helps you to discover research relevant for your work.