Attention-based two-phase model for video action detection

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper considers the task of action detection in long untrimmed video. Existing methods tend to process every single frame or fragment through the whole video to make detection decisions, which can not only be time-consuming but also burden the computational models. Instead, we present an attention-based model to perform action detection by watching only a few fragments, which is independent with the video length and can be applied to real-world videos consequently. Our motivation is inspired by the observation that human usually focus their attention sequentially on different frames of a video to quickly narrow down the extent where an action occurs. Our model is a two-phase architecture, where a temporal proposal network is designed to predict temporal proposals for multi-category actions in the first phase. The temporal proposal network observes a fixed number of locations in a video to predict action bounds and learn a location transfer policy. In the second phase, a well-trained classifier is prepared to extract visual information from proposals, to classify the action and decide whether to adopt the proposals. We evaluate our model on ActivityNet dataset and show it can significantly outperform the baseline.

Cite

CITATION STYLE

APA

Chen, X., Wang, W., Li, W., & Wang, J. (2017). Attention-based two-phase model for video action detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10425 LNCS, pp. 81–93). Springer Verlag. https://doi.org/10.1007/978-3-319-64698-5_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free