An Object-Based Bayesian Framework for Top-Down Visual Attention

2Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

We introduce a new task-independent framework to model top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of-the-art object detection models. Evaluating over ∼3 hours (appx. 315, 000 eye fixations and 12, 600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multi-modal information from gist, bottom-up saliency, physical actions, and events) to eye positions, 2) 14 state-of-the-art bottom-up saliency models, and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data.

Cite

CITATION STYLE

APA

Borji, A., Sihite, D. N., & Itti, L. (2012). An Object-Based Bayesian Framework for Top-Down Visual Attention. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, AAAI 2012 (pp. 1529–1535). AAAI Press. https://doi.org/10.1609/aaai.v26i1.8334

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free