Weakly-supervised video scene co-parsing

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose a scene co-parsing framework to assign pixel-wise semantic labels in weakly-labeled videos, i.e., only video-level category labels are given. To exploit rich semantic information, we first collect all videos that share the same video-level labels and segment them into supervoxels. We then select representative supervoxels for each category via a supervoxel ranking process. This ranking problem is formulated with a submodular objective function and a scene-object classifier is incorporated to distinguish scenes and objects. To assign each supervoxel a semantic label, we match each supervoxel to these selected representatives in the feature domain. Each supervoxel is then associated with a series of category potentials and assigned to a semantic label with the maximum one. The proposed co-parsing framework extends scene parsing from single images to videos and exploits mutual information among a video collection. Experimental results on the Wild-8 and SUNY-24 datasets show that the proposed algorithm performs favorably against the state-of-the-art approaches.

Cite

CITATION STYLE

APA

Zhong, G., Tsai, Y. H., & Yang, M. H. (2017). Weakly-supervised video scene co-parsing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10111 LNCS, pp. 20–36). Springer Verlag. https://doi.org/10.1007/978-3-319-54181-5_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free