Spatio-temporal action detection with cascade proposal and location anticipation

Zhenheng Yang; Jiyang Gao; Ram Nevatia

Conference ProceedingsOPEN ACCESS

Spatio-temporal action detection with cascade proposal and location anticipation

British Machine Vision Conference 2017, BMVC 2017 (2017)

DOI: 10.5244/c.31.95

44Citations

52Readers

Abstract

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos. It is an important and challenging task as finding accurate human actions in both temporal and spatial space is important for analyzing large-scale video data. To tackle this problem, we propose a cascade proposal and location anticipation (CPLA) model for frame-level action detection. There are several salient points of our model: (1) a cascade region proposal network (casRPN) is adopted for action proposal generation and shows better localization accuracy compared with single region proposal network (RPN); (2) action spatio-temporal consistencies are exploited via a location anticipation network (LAN) and thus frame-level action detection is not conducted independently. Frame-level detections are then linked by solving an linking score maximization problem, and temporally trimmed into spatio-temporal action tubes. We demonstrate the effectiveness of our model on the challenging UCF101 and LIRIS-HARL datasets, both achieving state-of-the-art performance.

Cite

CITATION STYLE

APA

Yang, Z., Gao, J., & Nevatia, R. (2017). Spatio-temporal action detection with cascade proposal and location anticipation. In British Machine Vision Conference 2017, BMVC 2017. BMVA Press. https://doi.org/10.5244/c.31.95

Spatio-temporal action detection with cascade proposal and location anticipation

Abstract

Cite

Register to see more suggestions