AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos

Zheng Shou; Hang Gao; Lei Zhang; Kazuyuki Miyazawa; Shih Fu Chang

Conference ProceedingsOPEN ACCESS

AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11220 LNCS 162-179

DOI: 10.1007/978-3-030-01270-0_10

53Citations

205Readers

Abstract

Temporal Action Localization (TAL) in untrimmed video is important for many applications. But it is very expensive to annotate the segment-level ground truth (action class and temporal boundary). This raises the interest of addressing TAL with weak supervision, namely only video-level annotations are available during training). However, the state-of-the-art weakly-supervised TAL methods only focus on generating good Class Activation Sequence (CAS) over time but conduct simple thresholding on CAS to localize actions. In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. Our method achieves dramatically improved performance: under the IoU threshold 0.5, our method improves mAP on THUMOS’14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Shou, Z., Gao, H., Zhang, L., Miyazawa, K., & Chang, S. F. (2018). AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11220 LNCS, pp. 162–179). Springer Verlag. https://doi.org/10.1007/978-3-030-01270-0_10

AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos

Abstract

Author supplied keywords

Cite

Register to see more suggestions