Spatio-Temporal Transformer for Online Video Understanding

Zexu Du; Guoliang Zhang; Weijiang Lu; Ting Zhao; Peng Wu

Conference ProceedingsOPEN ACCESS

Spatio-Temporal Transformer for Online Video Understanding

Journal of Physics: Conference Series (2022) 2171(1)

DOI: 10.1088/1742-6596/2171/1/012020

1Citations

7Readers

Abstract

Leading methods in the field of online video understanding try to extract useful information from the spatial and temporal dimensions of an input video. But they are suffering from two problems: (1) These methods can only extract local video information, and cannot relate to the important features of the temporal context in the video. (2) Although some methods can quickly process the information of each frame in the video, the processing efficiency of the whole video is not good, so this type of method cannot be applied to online video understanding tasks. This article introduces a Transformer-based network, which considers spatial and temporal content, and can quickly process each video at the same time. Our approach can efficiently handle up to 170 videos with hundreds of frames per second for action classification. Our method achieve 10 to 90 times faster than existing methods on the action classification datasets.

Cite

CITATION STYLE

APA

Du, Z., Zhang, G., Lu, W., Zhao, T., & Wu, P. (2022). Spatio-Temporal Transformer for Online Video Understanding. In Journal of Physics: Conference Series (Vol. 2171). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/2171/1/012020

Spatio-Temporal Transformer for Online Video Understanding

Abstract

Cite

Register to see more suggestions