In this chapter, we present a framework to learn and predict regions of interest in videos, based on human eyemovements. In our approach, the eye gaze in- formation of several users are recorded as theywatch videos that are similar, and be- long to a particular application domain. This information is used to train a classifier to learn low-level video features from regions that attracted the visual attention of users. Such a classifier is combined with vision-based approaches to provide an in- tegrated framework to detect salient regions in videos. Till date, saliency prediction has been viewed from two different perspectives, namely visual attention model- ing and spatiotemporal interest point detection. These approaches have largely been vision-based. They detect regions having a predefined set of characteristics such as complex motion or high contrast, for all kinds of videos. However, what is ‘inter- esting’ varies from one application to another. By learning features of regions that capture the attention of viewers while watching a video, we aim to distinguish those that are actually salient in the given context, from the rest. The integrated approach ensures that both regions with anticipated content (top–down attention) and unan- ticipated content (bottom–up attention) are predicted by the proposed framework as salient. In our experiments with news videos of popular channels, the results show a significant improvement in the identification of relevant salient regions in such videos, when compared with existing approaches.
CITATION STYLE
Nataraju, S., Balasubramanian, V., & Panchanathan, S. (2011). An Integrated Approach to Visual Attention Modeling for Saliency Detection in Videos (pp. 181–214). https://doi.org/10.1007/978-0-85729-057-1_8
Mendeley helps you to discover research relevant for your work.