Fusion of multiple visual cues for object recognition in videos

Iván González-Díaz; Jenny Benois-Pineau; Vincent Buso; Hugo Boujut

Book Chapter

Fusion of multiple visual cues for object recognition in videos

Springer-Verlag London Ltd, (2014), 79-107

DOI: 10.1007/978-3-319-05696-8_4

5Citations

9Readers

Get full text

Abstract

In this chapter, we are interested in the open problem of meaningful object recognition in video. Recently the approaches which estimate human visual attention and incorporate it into the whole visual content understanding process have become popular. In estimation of visual attention in a complex spatio-temporal content such as video one has to fuse multiple information channels such as motion, spatial contrast, and others. In the first part of the chapter, we are interested in these questions and report on optimal strategies of bottom-up fusion in visual saliency estimation. Then the estimated visual saliency is used in pooling of local descriptors. We compare different pooling approaches and show results on rather interesting visual content: that one recorded with wearable cameras for a large-scale research on Alzheimer’s disease. The results which will be shown together with conclusion demonstrate that the approaches based on the saliency fusion outperform the best state-of-the art techniques in this content.

Cite

CITATION STYLE

APA

González-Díaz, I., Benois-Pineau, J., Buso, V., & Boujut, H. (2014). Fusion of multiple visual cues for object recognition in videos. In Advances in Computer Vision and Pattern Recognition (Vol. 64, pp. 79–107). Springer-Verlag London Ltd. https://doi.org/10.1007/978-3-319-05696-8_4

Fusion of multiple visual cues for object recognition in videos

Abstract

Cite

Register to see more suggestions