In this paper, we address the problem of searching action proposals in unconstrained video clips. Our approach starts from actionness estimation on framelevel bounding boxes, and then aggregates the bounding boxes belonging to the same actor across frames via linking, associating, tracking to generate spatial-temporal continuous action paths. To achieve the target, a novel actionness estimation method is firstly proposed by utilizing both human appearance and motion cues. Then, the association of the action paths is formulated as a maximum set coverage problem with the results of actionness estimation as a priori. To further promote the performance, we design an improved optimization objective for the problem and provide a greedy search algorithm to solve it. Finally, a tracking-by-detection scheme is designed to further refine the searched action paths. Extensive experiments on two challenging datasets, UCF-Sports and UCF-101, show that the proposed approach advances state-of-the-art proposal generation performance in terms of both accuracy and proposal quantity.
CITATION STYLE
Li, N., Xu, D., Ying, Z., Li, Z., & Li, G. (2017). Searching action proposals via spatial actionness estimation and temporal path inference and tracking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10112 LNCS, pp. 384–399). Springer Verlag. https://doi.org/10.1007/978-3-319-54184-6_24
Mendeley helps you to discover research relevant for your work.