Zero-shot event detection by multimodal distributional semantic embedding of videos

12Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.

Abstract

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-The-Art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.

Cite

CITATION STYLE

APA

Elhoseiny, M., Liu, J., Cheng, H., Sawhney, H., & Elgammal, A. (2016). Zero-shot event detection by multimodal distributional semantic embedding of videos. In 30th AAAI Conference on Artificial Intelligence, AAAI 2016 (pp. 3478–3486). AAAI press. https://doi.org/10.1609/aaai.v30i1.10458

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free