Video Instance Segmentation (VIS) aims to detect, segment, and track instances appearing in a video. To reduce annotation costs, some existing VIS methods use the Weakly Supervised Scheme (WSVIS). However, those WSVIS methods usually run in an offline manner, which fails in handling ongoing and long videos due to the limited computational resources. It would be considerable benefits if online models could match or surpass the performance of offline models. In this paper, we propose OWS-Seg, an end-to-end, simple, and efficient online WSVIS network with box annotations. Concretely, OWS-Seg consists of two novel contrastive learning branches: the Instance Contrastive Learning (ICL) branch learns instance level discriminative features to distinguish different instances in each frame, and the Mask Contrastive Learning (MCL) branch with Boxccam learns pixel level discriminative features to differentiate foreground and background. Experimental results show that OWS-Seg achieves promising performance, e.g., 43.5% AP on YouTube-VIS 2019, 36.6% AP on YouTube-VIS 2021, and 21.9% AP on OVIS. Besides, OWS-Seg achieves comparable performance to offline WSVIS and surpasses recent fully supervised methods, demonstrating its wide range of practical applications.
CITATION STYLE
Ning, Y., Li, F., Dong, M., & Li, Z. (2023). OWS-Seg: Online Weakly Supervised Video Instance Segmentation via Contrastive Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14260 LNCS, pp. 476–488). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-44195-0_39
Mendeley helps you to discover research relevant for your work.