Sufficient Vision Transformer

2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Currently, Vision Transformer (ViT) and its variants have demonstrated promising performance on various computer vision tasks. Nevertheless, task-irrelevant information such as background nuisance and noise in patch tokens would damage the performance of ViT-based models. In this paper, we develop Sufficient Vision Transformer (Suf-ViT) as a new solution to address this issue. In our research, we propose the Sufficiency-Blocks (S-Blocks) to be applied across the depth of Suf-ViT to disentangle and discard task-irrelevant information accurately. Besides, to boost the training of Suf-ViT, we formulate a Sufficient-Reduction Loss (SRLoss) leveraging the concept of Mutual Information (MI) that enables Suf-ViT to extract more reliable sufficient representations by removing task-irrelevant information. Extensive experiments on benchmark datasets such as ImageNet, ImageNet-C, and CIFAR-10 indicate that our method can achieve state-of-the-art or competing performance over other baseline methods. Codes are available at: https://github.com/zhicheng2T0/Sufficient-Vision-Transformer.git

Cite

CITATION STYLE

APA

Cheng, Z., Su, X., Wang, X., You, S., & Xu, C. (2022). Sufficient Vision Transformer. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 190–200). Association for Computing Machinery. https://doi.org/10.1145/3534678.3539322

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free