Abstract
This work presents a novel end-to-end trainable CNN model for high performance visual object tracking. It learns both low-level fine-grained representations and a high-level semantic embedding space in a mutual reinforced way, and a multi-task learning strategy is proposed to perform the correlation analysis on representations from both levels. In particular, a fully convolutional encoder-decoder network is designed to reconstruct the original visual features from the semantic projections to preserve all the geometric information. Moreover, the correlation filter layer working on the fine-grained representations leverages a global context constraint for accurate object appearance modeling. The correlation filter in this layer is updated online efficiently without network fine-tuning. Therefore, the proposed tracker benefits from two complementary effects: the adaptability of the fine-grained correlation analysis and the generalization capability of the semantic embedding. Extensive experimental evaluations on four popular benchmarks demonstrate its state-of-the-art performance.
Cite
CITATION STYLE
Wang, Q., Zhang, M., Xing, J., Gao, J., Hu, W., & Maybank, S. (2018). Do not lose the details: Reinforced representation learning for high performance visual tracking. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 985–991). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/137
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.