Do not lose the details: Reinforced representation learning for high performance visual tracking

40Citations
Citations of this article
55Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This work presents a novel end-to-end trainable CNN model for high performance visual object tracking. It learns both low-level fine-grained representations and a high-level semantic embedding space in a mutual reinforced way, and a multi-task learning strategy is proposed to perform the correlation analysis on representations from both levels. In particular, a fully convolutional encoder-decoder network is designed to reconstruct the original visual features from the semantic projections to preserve all the geometric information. Moreover, the correlation filter layer working on the fine-grained representations leverages a global context constraint for accurate object appearance modeling. The correlation filter in this layer is updated online efficiently without network fine-tuning. Therefore, the proposed tracker benefits from two complementary effects: the adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding. Extensive experimental evaluations on four popular benchmarks demonstrate its state-of-the-art performance.

Cite

CITATION STYLE

APA

Wang, Q., Zhang, M., Xing, J., Gao, J., Hu, W., & Maybank, S. (2018). Do not lose the details: Reinforced representation learning for high performance visual tracking. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 985–991). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/137

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free