Video object linguistic grounding

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The goal of this work is segmenting on a video sequence the objects which are mentioned in a linguistic description of the scene. We have adapted an existing deep neural network that achieves state of the art performance in semi-supervised video object segmentation, to add a linguistic branch that would generate an attention map over the video frames, making the segmentation of the objects temporally consistent along the sequence.

Cite

CITATION STYLE

APA

Herrera-Palacio, A., Ventura, C., & Giro-I-Nieto, X. (2019). Video object linguistic grounding. In MULEA 2019 - 1st International Workshop on Multimodal Understanding and Learning for Embodied Applications, co-located with MM 2019 (pp. 49–51). Association for Computing Machinery, Inc. https://doi.org/10.1145/3347450.3357662

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free