Bag-of-words image representation: Key ideas and further insight

Marc T. Law; Nicolas Thome; Matthieu Cord

Book Chapter

Bag-of-words image representation: Key ideas and further insight

Springer-Verlag London Ltd, (2014), 29-52

DOI: 10.1007/978-3-319-05696-8_2

27Citations

18Readers

Get full text

Abstract

In the context of object and scene recognition, state-of-the-art performances are obtained with visual Bag-of-Words (BoW) models of mid-level representations computed from dense sampled local descriptors (e.g., Scale-Invariant Feature Transform (SIFT)). Several methods to combine low-level features and to set mid-level parameters have been evaluated recently for image classification. In this chapter, we study in detail the different components of the BoW model in the context of image classification. Particularly, we focus on the coding and pooling steps and investigate the impact of the main parameters of the BoW pipeline. We show that an adequate combination of several low (sampling rate, multiscale) and mid-level (codebook size, normalization) parameters is decisive to reach good performances. Based on this analysis, we propose a merging scheme that exploits the specificities of edge-based descriptors. Low and high contrast regions are pooled separately and combined to provide a powerful representation of images. We study the impact on classification performance of the contrast threshold that determines whether a SIFT descriptor corresponds to a low contrast region or a high contrast region. Successful experiments are provided on the Caltech-101 and Scene-15 datasets.

Cite

CITATION STYLE

APA

Law, M. T., Thome, N., & Cord, M. (2014). Bag-of-words image representation: Key ideas and further insight. In Advances in Computer Vision and Pattern Recognition (Vol. 64, pp. 29–52). Springer-Verlag London Ltd. https://doi.org/10.1007/978-3-319-05696-8_2

Bag-of-words image representation: Key ideas and further insight

Abstract

Cite

Register to see more suggestions