Spot on: Action localization from pointly-supervised proposals

48Citations
Citations of this article
67Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instane Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood, resulting in Hollywood2Tubes, available at http://tinyurl. com/hollywood2tubes.

Author supplied keywords

Cite

CITATION STYLE

APA

Mettes, P., van Gemert, J. C., & Snoek, C. G. M. (2016). Spot on: Action localization from pointly-supervised proposals. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9909 LNCS, pp. 437–453). Springer Verlag. https://doi.org/10.1007/978-3-319-46454-1_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free