The deep similarity tracking via two-stream or multiple-stream network architectures has drawn great attention due to its strong capability of extracting discriminative feature with balanced accuracy and speed. However, these networks need a careful data pairing processing and are usually difficult to be updated for online visual tracking. In this paper, we propose a simple and effective discriminative feature extractor via a Single-Stream Deep Similarity learning for online visual Tracking, defined by SSDST. Different from the popular two-stream or multiple-stream architecture, the proposed method is built on a usual CNN architecture such as VGG-M network only with one branch. We design a contrastive loss layer, where the samples are implicitly paired, to directly learn discriminative feature on the large video dataset. The proposed network is easily applied to online tracking by adding a binary classification layer instead of contrastive loss layer for handling a specific video. The proposed SSDST is extensively verified on two representative benchmarks and shows better advantages over online trackers and the two-stream or multiple-stream trackers.
CITATION STYLE
Ning, J., Shi, H., Ni, J., & Fu, Y. (2019). Single-stream deep similarity learning tracking. IEEE Access, 7, 127781–127787. https://doi.org/10.1109/ACCESS.2019.2939367
Mendeley helps you to discover research relevant for your work.