In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images, to show the possibility to scale Content-based Image Retrieval (CBIR) systems towards the Web size. First, we had to tackle the non-trivial process of image crawling and descriptive feature extraction, performed by using the European EGEE computer GRID, building a test collection, the first of such scale, that will be opened to the research community for experiments and comparisons. Then, we had to develop indexing and searching mechanisms which can scale up to these volumes and answer similarity queries in real-time. The results of our experiments are very encouraging for future applications.
Batko, M., Falchi, F., Lucchese, C., Novak, D., Perego, R., Rabitti, F., … Zezula, P. (2008). Crawling, indexing, and similarity searching images on the web (Extended abstract). In SEBD 2008 - Proceedings of the 16th Italian Symposium on Advanced Database Systems (pp. 382–389).