Learning What to Learn for Video Object Segmentation

63Citations
Citations of this article
220Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video object segmentation (VOS) is a highly challenging problem, since the target object is only defined by a first-frame reference mask during inference. The problem of how to capture and utilize this limited information to accurately segment the target remains a fundamental research question. We address this by introducing an end-to-end trainable VOS architecture that integrates a differentiable few-shot learner. Our learner is designed to predict a powerful parametric model of the target by minimizing a segmentation error in the first frame. We further go beyond the standard few-shot learning paradigm by learning what our target model should learn in order to maximize segmentation accuracy. We perform extensive experiments on standard benchmarks. Our approach sets a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5, corresponding to a 2.6 % relative improvement over the previous best result. The code and models are available at https://github.com/visionml/pytracking.

Cite

CITATION STYLE

APA

Bhat, G., Lawin, F. J., Danelljan, M., Robinson, A., Felsberg, M., Van Gool, L., & Timofte, R. (2020). Learning What to Learn for Video Object Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12347 LNCS, pp. 777–794). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58536-5_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free