On-the-fly learning for visual search of large-scale image and video datasets

41Citations
Citations of this article
88Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The objective of this work is to visually search large-scale video datasets for semantic entities specified by a text query. The paradigm we explore is constructing visual models for such semantic entities on-the-fly, i.e. at run time, by using an image search engine to source visual training data for the text query. The approach combines fast and accurate learning and retrieval, and enables videos to be returned within seconds of specifying a query. We describe three classes of queries, each with its associated visual search method: object instances (using a bag of visual words approach for matching); object categories (using a discriminative classifier for ranking key frames); and faces (using a discriminative classifier for ranking face tracks). We discuss the features suitable for each class of query, for example Fisher vectors or features derived from convolutional neural networks (CNNs), and how these choices impact on the trade-off between three important performance measures for a real-time system of this kind, namely: (1) accuracy, (2) memory footprint, and (3) speed. We also discuss and compare a number of important implementation issues, such as how to remove ‘outliers’ in the downloaded images efficiently, and how to best obtain a single descriptor for a face track. We also sketch the architecture of the real-time on-the-fly system. Quantitative results are given on a number of large-scale image and video benchmarks (e.g. TRECVID INS, MIRFLICKR-1M), and we further demonstrate the performance and real-world applicability of our methods over a dataset sourced from 10,000 h of unedited footage from BBC News, comprising 5M+ key frames.

References Powered by Scopus

The pascal visual object classes (VOC) challenge

15389Citations
N/AReaders
Get full text

Visualizing and understanding convolutional networks

11202Citations
N/AReaders
Get full text

Video google: A text retrieval approach to object matching in videos

5685Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A survey on deep learning and its applications

970Citations
N/AReaders
Get full text

A review of semantic segmentation using deep neural networks

621Citations
N/AReaders
Get full text

Analysis on change detection techniques for remote sensing applications: A review

94Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Chatfield, K., Arandjelović, R., Parkhi, O., & Zisserman, A. (2015). On-the-fly learning for visual search of large-scale image and video datasets. International Journal of Multimedia Information Retrieval, 4(2), 75–93. https://doi.org/10.1007/s13735-015-0077-0

Readers over time

‘15‘16‘17‘18‘19‘20‘21‘22‘23‘2406121824

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 44

67%

Researcher 12

18%

Lecturer / Post doc 7

11%

Professor / Associate Prof. 3

5%

Readers' Discipline

Tooltip

Computer Science 50

82%

Engineering 7

11%

Physics and Astronomy 2

3%

Medicine and Dentistry 2

3%

Save time finding and organizing research with Mendeley

Sign up for free
0