Video imprint segmentation for temporal action detection in untrimmed videos

Zhanning Gao; Le Wang; Qilin Zhang; Zhenxing Niu; Nanning Zheng; Gang Hua

Conference ProceedingsOPEN ACCESS

Video imprint segmentation for temporal action detection in untrimmed videos

33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (2019) 8328-8335

DOI: 10.1609/aaai.v33i01.33018328

24Citations

35Readers

Abstract

We propose a temporal action detection by spatial segmentation framework, which simultaneously categorize actions and temporally localize action instances in untrimmed videos. The core idea is the conversion of temporal detection task into a spatial semantic segmentation task. Firstly, the video imprint representation is employed to capture the spatial/temporal interdependences within/among frames and represent them as spatial proximity in a feature space. Subsequently, the obtained imprint representation is spatially segmented by a fully convolutional network. With such segmentation labels projected back to the video space, both temporal action boundary localization and per-frame spatial annotation can be obtained simultaneously. The proposed framework is robust to variable lengths of untrimmed videos, due to the underlying fixed-size imprint representations. The efficacy of the framework is validated in two public action detection datasets.

Cite

CITATION STYLE

APA

Gao, Z., Wang, L., Zhang, Q., Niu, Z., Zheng, N., & Hua, G. (2019). Video imprint segmentation for temporal action detection in untrimmed videos. In 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (pp. 8328–8335). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33018328

Video imprint segmentation for temporal action detection in untrimmed videos

Abstract

Cite

Register to see more suggestions