Distributing multimedia indexes to multiple nodes enables search over very large datasets (i.e., over one billion images and videos), but comes with a set of challenges: how to distribute documents and queries efectively across nodes to support concurrent querying? and how to deal with the increased potential for lack of response from nodes (e.g., node fail-stops or dropping of network packages)? An index where partitions are based on the distribution of feature vectors in the original space can improve redundancy and increase eiciency: nearest neighbors are only present on a small, set number of partitions, reducing the number of nodes to inspect for each query. This paper describes how sparse hashes can help ind this balance and create better distribution policies for high-dimensional feature vectors. Inspired by existing literature on distributed text and media indexes, our proposal distributes and balances documents and queries to a subset of the nodes, according to their orthogonal similarities. We performed exhaustive benchmarks of our approach on a commercial cloud service. Experiments on a one billion vector dataset showthat our approach has a lowpartitioning overhead (3 to 5 ms per query), achieves balanced document and query distribution (the variation in document and query distribution across nodes is smaller than 1% and 10%, respectively), handles concurrent queries efectively and degrades gracefully with node failures (less than 2% of precision loss per node down).
CITATION STYLE
Mourão, A., & Magalhães, J. (2019). Towards cloud distributed image indexing by sparse hashing. In ICMR 2019 - Proceedings of the 2019 ACM International Conference on Multimedia Retrieval (pp. 288–296). Association for Computing Machinery, Inc. https://doi.org/10.1145/3323873.3325046
Mendeley helps you to discover research relevant for your work.