Abstract
We investigate the problem of visual-query based retrieval from large image datasets when the visual queries comprise arbitrary regions of interest (ROI) rather than entire images. Our proposal is a compact image descriptor that combines the vector of locally aggregated descriptors (VLAD) of Jegou et. al. with a multi-level, Voronoi-based, spatial partitioning of each dataset image, and it is termed as the Voronoi VLAD (VVLAD). The proposed multi-level Voronoi partitioning uses a spatial hierarchical K-means over interest-point locations, and computes a VLAD over each cell. In order to reduce the matching complexity when handling very large datasets, we propose the following modifications. First, we utilize the tree structure of the spatial hierarchical K-means to perform a top-to-bottom pruning for local similarity maxima, rather than exhaustively matching against all cells (Fast-VVLAD). Second, we propose to aggregate VLADs of adjacent Voronoi cells in order to reduce the overall VVLAD storage requirement per image. Finally, we propose a new image similarity score for Fast-VVLAD that combines relevant information from all partition levels into a single measure for similarity. For a range of ROI queries in two standard datasets, Fast-VVLAD achieves comparable or higher mean Average Precision against the state-of-the-art Multi-VLAD framework while offering more than two-fold acceleration.
Cite
CITATION STYLE
Chadha, A., & Andreopoulos, Y. (2015). Region-of-interest retrieval in large image datasets with voronoi VLAD. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9163, pp. 218–227). Springer Verlag. https://doi.org/10.1007/978-3-319-20904-3_21
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.