MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning

10Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, MLP-Like networks have been revived for image recognition. However, whether it is possible to build a generic MLP-Like architecture on video domain has not been explored, due to complex spatial-temporal modeling with large computation burden. To fill this gap, we present an efficient self-attention free backbone, namely MorphMLP, which flexibly leverages the concise Fully-Connected (FC) layer for video representation learning. Specifically, a MorphMLP block consists of two key layers in sequence, i.e., MorphFCs and MorphFCt, for spatial and temporal modeling respectively. MorphFCs can effectively capture core semantics in each frame, by progressive token interaction along both height and width dimensions. Alternatively, MorphFCt can adaptively learn long-term dependency over frames, by temporal token aggregation on each spatial location. With such multi-dimension and multi-scale factorization, our MorphMLP block can achieve a great accuracy-computation balance. Finally, we evaluate our MorphMLP on a number of popular video benchmarks. Compared with the recent state-of-the-art models, MorphMLP significantly reduces computation but with better accuracy, e.g., MorphMLP-S only uses 50% GFLOPs of VideoSwin-T but achieves 0.9% top-1 improvement on Kinetics400, under ImageNet1K pretraining. MorphMLP-B only uses 43% GFLOPs of MViT-B but achieves 2.4% top-1 improvement on SSV2, even though MorphMLP-B is pretrained on ImageNet1K while MViT-B is pretrained on Kinetics400. Moreover, our method adapted to the image domain outperforms previous SOTA MLP-Like architectures. Code is available at https://github.com/MTLab/MorphMLP.

Cite

CITATION STYLE

APA

Zhang, D. J., Li, K., Wang, Y., Chen, Y., Chandra, S., Qiao, Y., … Shou, M. Z. (2022). MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13695 LNCS, pp. 230–248). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19833-5_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free