Recent years have witnessed the top performances of integrating multi-level features from the pre-trained convolutional neural network (CNN) into correlation filters framework. However, they still suffer from background interference in detection stage due to the large search region and contamination of training samples caused by inaccurate tracking. In this paper, to suppress the interference of background features in target detection stage, an effective spatial attention map (SAM) is proposed to differently weight the multi-hierarchical convolutional features from search region to obtain the attentional features. This way helps to reduce the filter values corresponding to background features. Moreover, we construct multiple elementary correlation filter (ECF) models on multi-hierarchical deep features from CNN to track the target in parallel. To further improve the tracking stability, a multi-model adaptive response fusion (MAF) mechanism is presented. The mechanism can adaptively choose the outputs of reliable ECF models for adaptive weighted fusion by evaluating the confidences of response maps generated by attentional features convolved with ECF models. Finally, to adapt the target appearance changes in the following frames and avoid model corruption, we propose an adaptive updating strategy for the updates of the SAM and ECF models. We perform comprehensive experiments on OTB-2013 and OTB-2015 datasets and the experimental results show the superiority of our algorithm over other 12 state-of-the-art approaches.
CITATION STYLE
Zhang, J., Wu, Y., Feng, W., & Wang, J. (2019). Spatially Attentive Visual Tracking Using Multi-Model Adaptive Response Fusion. IEEE Access, 7, 83873–83887. https://doi.org/10.1109/ACCESS.2019.2924944
Mendeley helps you to discover research relevant for your work.