Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Chuanxin Tang; Yucheng Zhao; Guangting Wang; Chong Luo; Wenxuan Xie; Wenjun Zeng

Conference ProceedingsOPEN ACCESS

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (2022) 36 2344-2351

DOI: 10.1609/aaai.v36i2.20133

50Citations

104Readers

Abstract

Transformers have sprung up in the field of computer vision. In this work, we explore whether the core self-attention module in Transformer is the key to achieving excellent performance in image recognition. To this end, we build an attention-free network called sMLPNet based on the existing MLP-based vision models. Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module. For 2D image tokens, sMLP applies 1D MLP along the axial directions and the parameters are shared among rows or columns. By sparse connection and weight sharing, sMLP module significantly reduces the number of model parameters and computational complexity, avoiding the common over-fitting problem that plagues the performance of MLP-like models. When only trained on the ImageNet-1K dataset, the proposed sMLPNet achieves 81.9% top-1 accuracy with only 24M parameters, which is much better than most CNNs and vision Transformers under the same model size constraint. When scaling up to 66M parameters, sMLPNet achieves 83.4% top-1 accuracy, which is on par with the state-of-the-art Swin Transformer. The success of sMLPNet suggests that the self-attention mechanism is not necessarily a silver bullet in computer vision. The code and models are publicly available at https://github.com/microsoft/SPACH.

Cite

CITATION STYLE

APA

Tang, C., Zhao, Y., Wang, G., Luo, C., Xie, W., & Zeng, W. (2022). Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 2344–2351). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i2.20133

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Abstract

Cite

Register to see more suggestions