Video Object Segmentation in Panoptic Wild Scenes

10Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we introduce semi-supervised video object segmentation (VOS) to panoptic wild scenes and present a large-scale benchmark as well as a baseline method for it. Previous benchmarks for VOS with sparse annotations are not sufficient to train or evaluate a model that needs to process all possible objects in real-world scenarios. Our new benchmark (VIPOSeg) contains exhaustive object annotations and covers various real-world object categories which are carefully divided into subsets of thing/stuff and seen/unseen classes for comprehensive evaluation. Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which associates multiple objects by panoptic identification in a pyramid architecture on multiple scales. Experimental results show that VIPOSeg can not only boost the performance of VOS models by panoptic training but also evaluate them comprehensively in panoptic scenes. Previous methods for classic VOS still need to improve in performance and efficiency when dealing with panoptic scenes, while our PAOT achieves SOTA performance with good efficiency on VIPOSeg and previous VOS benchmarks. PAOT also ranks 1st in the VOT2022 challenge. Our dataset and code are available at https://github.com/yoxu515/VIPOSeg-Benchmark.

Cite

CITATION STYLE

APA

Xu, Y., Yang, Z., & Yang, Y. (2023). Video Object Segmentation in Panoptic Wild Scenes. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2023-August, pp. 1604–1612). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2023/178

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free