Recently, multi-object tracking (MOT) for estimating trajectories of pedestrians has undergone fast development and played an important role in human-centric video analysis. However, video analysis in complex events (e.g. scenes in HiEve dataset) is still under-explored. In complex real-world scenarios, domain gap in unseen testing scenes and severe occlusion problem that disconnects tracks are challenging for existing online MOT methods without domain adaptation. To alleviate domain gap, we study the problem in a transductive learning setting, which assumes that unlabeled testing data is available for learning offline tracking. We propose a transductive interactive self-training method to adapt the tracking model to unseen crowded scenes with unlabeled testing data by means of teacher-student interative learning. To reduce prediction variance in an unseen domain, we train two different models and teach one model with pseudo labels of unlabeled data predicted by the other model interactively. To improve robustness against occlusions during self-training, we exploit disconnected track interpolation (DTI) to refine the predicted pseudo labels. Our method achieved MOTA of 60.23 on HiEve dataset and won the first place of Multi-person Motion Tracking in Complex Events (with Private Detection) in the ACM MM Grand Challenge on Large-scale Human-centric Video Analysis in Complex Events.
CITATION STYLE
Wu, A., Lin, C., Chen, B., Huang, W., Huang, Z., & Zheng, W. S. (2020). Transductive Multi-Object Tracking in Complex Events by Interactive Self-Training. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 4620–4624). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3416282
Mendeley helps you to discover research relevant for your work.