Fusion of multiple visual cues for object recognition in videos

5Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this chapter, we are interested in the open problem of meaningful object recognition in video. Recently the approaches which estimate human visual attention and incorporate it into the whole visual content understanding process have become popular. In estimation of visual attention in a complex spatio-temporal content such as video one has to fuse multiple information channels such as motion, spatial contrast, and others. In the first part of the chapter, we are interested in these questions and report on optimal strategies of bottom-up fusion in visual saliency estimation. Then the estimated visual saliency is used in pooling of local descriptors. We compare different pooling approaches and show results on rather interesting visual content: that one recorded with wearable cameras for a large-scale research on Alzheimer’s disease. The results which will be shown together with conclusion demonstrate that the approaches based on the saliency fusion outperform the best state-of-the art techniques in this content.

Cite

CITATION STYLE

APA

González-Díaz, I., Benois-Pineau, J., Buso, V., & Boujut, H. (2014). Fusion of multiple visual cues for object recognition in videos. In Advances in Computer Vision and Pattern Recognition (Vol. 64, pp. 79–107). Springer-Verlag London Ltd. https://doi.org/10.1007/978-3-319-05696-8_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free