Automatic annotation of unique locations from video and text

Chris Engels; Koen Deschacht; Jan Hendrik Becker; Tinne Tuytelaars; Marie Francine Moens; Luc Van Gool

Conference Proceedings

Automatic annotation of unique locations from video and text

British Machine Vision Conference, BMVC 2010 - Proceedings (2010)

DOI: 10.5244/C.24.115

7Citations

15Readers

Get full text

Abstract

Given a video and associated text, we propose an automatic annotation scheme in which we employ a latent topic model to generate topic distributions from weighted text and then modify these distributions based on visual similarity. We apply this scheme to location annotation of a television series for which transcripts are available. The topic distributions allow us to avoid explicit classification, which is useful in cases where the exact number of locations is unknown. Moreover, many locations are unique to a single episode, making it impossible to obtain representative training data for a supervised approach. Our method first segments the episode into scenes by fusing cues from both images and text. We then assign location-oriented weights to the text and generate topic distributions for each scene using Latent Dirichlet Allocation. Finally, we update the topic distributions using the distributions of visually similar scenes. We formulate our visual similarity between scenes as an Earth Mover's Distance problem. We quantitatively validate our multi-modal approach to segmentation and qualitatively evaluate the resulting location annotations. Our results demonstrate that we are able to generate accurate annotations, even for locations only seen in a single episode. © 2010. The copyright of this document resides with its authors.

Cite

CITATION STYLE

APA

Engels, C., Deschacht, K., Becker, J. H., Tuytelaars, T., Moens, M. F., & Van Gool, L. (2010). Automatic annotation of unique locations from video and text. In British Machine Vision Conference, BMVC 2010 - Proceedings. British Machine Vision Association, BMVA. https://doi.org/10.5244/C.24.115

Automatic annotation of unique locations from video and text

Abstract

Cite

Register to see more suggestions