Vision Transformers are Robust Learners

Sayak Paul; Pin Yu Chen

Conference ProceedingsOPEN ACCESS

Vision Transformers are Robust Learners

Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (2022) 36 2071-2081

DOI: 10.1609/aaai.v36i2.20103

138Citations

201Readers

Abstract

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1 accuracy of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https://git.io/J3VO0.

Cite

CITATION STYLE

APA

Paul, S., & Chen, P. Y. (2022). Vision Transformers are Robust Learners. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 2071–2081). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i2.20103

Vision Transformers are Robust Learners

Abstract

Cite

Register to see more suggestions