Generic object tracking is a fundamental vision task. Numerous attempts are made to utilize handcrafted features like Hogs, deep convolutional features pretrained independently from other vision tasks, as well as hierarchical features. These methods achieve good balanced accuracy and speed in visual tracking. However, they explore the complementary characteristics of deep and shallow features imperfectly and ignore surrounding background information. In this paper, we exploit multi-cue cascades for building a robust end-to-end visual tracking, which cascades each level response via fully exploring the complementary properties of different levels of learning. Firstly, we crop out image patches and extract the features to construct corresponding levels of learning. Each levels of learning is utilized to cope with different challenges. Secondly, these multi-level learning procedure is embedded into the dynamic siamese networks for an end-to-end training. Additionaly, we take surrounding background information into account in the high level learning procedure. Finally, the outputs of each level are fused and we gain the accuracy and robustness trade-off. Extensive experiments on OTB-2013, OTB-2015 and VOT 2016 demonstrate that the proposed tracker performs favorably in comparison with the state-of-the-art trackers, while being more robust for background clutters.
CITATION STYLE
Ding, F., Li, C., Li, T., & Yang, W. (2019). Multi-cue cascades for robust visual tracking. IEEE Access, 7, 125079–125090. https://doi.org/10.1109/ACCESS.2019.2938187
Mendeley helps you to discover research relevant for your work.