This paper proposes a superpixel-based spatiotemporal saliency model for saliency detection in videos. Based on the superpixel representation of video frames, motion histograms and color histograms are extracted at the superpixel level as local features and frame level as global features. Then, superpixel-level temporal saliency is measured by integrating motion distinctiveness of superpixels with a scheme of temporal saliency prediction and adjustment, and superpixel-level spatial saliency is measured by evaluating global contrast and spatial sparsity of superpixels. Finally, a pixel-level saliency derivation method is used to generate pixel-level temporal and spatial saliency maps, and an adaptive fusion method is exploited to integrate them into the spatiotemporal saliency map. Experimental results on two public datasets demonstrate that the proposed model outperforms six state-of-the-art spatiotemporal saliency models in terms of both saliency detection and human fixation prediction.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below