The power of distance distributions: Cost models and scheduling policies for quality-controlled similarity queries

Paolo Ciaccia; Marco Patella

Conference Proceedings

The power of distance distributions: Cost models and scheduling policies for quality-controlled similarity queries

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10609 LNCS 3-16

DOI: 10.1007/978-3-319-68474-1_1

1Citations

2Readers

Get full text

Abstract

Approximate similarity queries are a practical way to obtain good, yet suboptimal, results from large data sets without having to pay high execution costs. In this paper we analyze the problem of understanding how the strategy for searching through an index tree, also called scheduling policy, can influence costs. We consider quality-controlled similarity queries, in which the user sets a quality (distance) threshold θ and the system halts as soon as it finds k objects in the data set at distance ≤ θ from the query object. After providing experimental evidence that the scheduling policy might indeed have a high impact on paid costs, we characterize the policies’ behavior through an analytical cost model, in which a major role is played by parameterized local distance distributions. Such distributions are also the key to derive new scheduling policies, which we show to be optimal in a simplified, yet relevant, scenario.

Cite

CITATION STYLE

APA

Ciaccia, P., & Patella, M. (2017). The power of distance distributions: Cost models and scheduling policies for quality-controlled similarity queries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10609 LNCS, pp. 3–16). Springer Verlag. https://doi.org/10.1007/978-3-319-68474-1_1

The power of distance distributions: Cost models and scheduling policies for quality-controlled similarity queries

Abstract

Cite

Register to see more suggestions