Top-down visual saliency via joint CRF and dictionary learning

117Citations
Citations of this article
153Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Top-down visual saliency is an important module of visual attention. In this work, we propose a novel top-down saliency model that jointly learns a Conditional Random Field (CRF) and a visual dictionary. The proposed model incorporates a layered structure from top to bottom: CRF, sparse coding and image patches. With sparse coding as an intermediate layer, CRF is learned in a feature-adaptive manner; meanwhile with CRF as the output layer, the dictionary is learned under structured supervision. For efficient and effective joint learning, we develop a max-margin approach via a stochastic gradient descent algorithm. Experimental results on the Graz-02 and PASCAL VOC datasets show that our model performs favorably against state-of-the-art top-down saliency methods for target object localization. In addition, the dictionary update significantly improves the performance of our model. We demonstrate the merits of the proposed top-down saliency model by applying it to prioritizing object proposals for detection and predicting human fixations.

Cite

CITATION STYLE

APA

Yang, J., & Yang, M. H. (2017). Top-down visual saliency via joint CRF and dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(3), 576–588. https://doi.org/10.1109/TPAMI.2016.2547384

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free