We propose a visual object tracker that improves accuracy while significantly decreasing false alarm rate. This is achieved by a late fusion scheme that integrates the motion model of particle sampling with the region proposal network of Mask R-CNN during inference. The qualified bounding boxes selected by the late fusion are fed into the Mask R-CNN’s head layer for the detection of the tracked object. We refer the introduced scheme, TAVOT, as target aware visual object tracker since it is capable of minimizing false detections with the guidance of variable rate particle sampling initialized by the target region of interest. It is shown that TAVOT is capable of modeling temporal video content with a simple motion model thus constitutes a promising video object tracker. Performance evaluation performed on VOT2016 video sequences demonstrates that TAVOT 22% increases the success rate, while 73% decreasing the false alarm rate compared to the baseline Mask R-CNN. Compared to the top tracker of VOT2016 around 5% increase at the success rate is reported where intersection over union is greater than 0.5.
CITATION STYLE
Ozer, C., Gurkan, F., & Gunsel, B. (2019). Target aware visual object tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11663 LNCS, pp. 186–198). Springer Verlag. https://doi.org/10.1007/978-3-030-27272-2_16
Mendeley helps you to discover research relevant for your work.