Abstract
In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.
Cite
CITATION STYLE
Huang, L., Huang, Y., Ouyang, W., & Wang, L. (2020). Relational prototypical network for weakly supervised temporal action localization. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 11053–11060). AAAI press. https://doi.org/10.1609/aaai.v34i07.6760
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.