Sufficient Vision Transformer

Zhi Cheng; Xiu Su; Xueyu Wang; Shan You; Chang Xu

Conference ProceedingsOPEN ACCESS

Sufficient Vision Transformer

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) 190-200

DOI: 10.1145/3534678.3539322

2Citations

7Readers

Get full text

Abstract

Currently, Vision Transformer (ViT) and its variants have demonstrated promising performance on various computer vision tasks. Nevertheless, task-irrelevant information such as background nuisance and noise in patch tokens would damage the performance of ViT-based models. In this paper, we develop Sufficient Vision Transformer (Suf-ViT) as a new solution to address this issue. In our research, we propose the Sufficiency-Blocks (S-Blocks) to be applied across the depth of Suf-ViT to disentangle and discard task-irrelevant information accurately. Besides, to boost the training of Suf-ViT, we formulate a Sufficient-Reduction Loss (SRLoss) leveraging the concept of Mutual Information (MI) that enables Suf-ViT to extract more reliable sufficient representations by removing task-irrelevant information. Extensive experiments on benchmark datasets such as ImageNet, ImageNet-C, and CIFAR-10 indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Codes are available at: https://github.com/zhicheng2T0/Sufficient-Vision-Transformer.git

Author supplied keywords

Cite

CITATION STYLE

APA

Cheng, Z., Su, X., Wang, X., You, S., & Xu, C. (2022). Sufficient Vision Transformer. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 190–200). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539322

Sufficient Vision Transformer

Abstract

Author supplied keywords

Cite

Register to see more suggestions