Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection

Chenyu Wang; Toshio Endo; Takahiro Hirofuchi; Tsutomu Ikegami

Conference ProceedingsOPEN ACCESS

Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection

Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2023) 5 583-590

DOI: 10.5220/0011675800003417

0Citations

5Readers

Get full text

Abstract

We present the Pyramid Swin Transformer for object detection and image classification, by taking advantage of more shift window operations, smaller and more different size windows. We also add a Feature Pyramid Network for object detection, which produces excellent results. This architecture is implemented in four stages, containing different size window layers. We test our architecture on ImageNet classification and COCO detection. Pyramid Swin Transformer achieves 85.4% accuracy on ImageNet classification and 54.3 box AP on COCO.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, C., Endo, T., Hirofuchi, T., & Ikegami, T. (2023). Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection. In Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 583–590). Science and Technology Publications, Lda. https://doi.org/10.5220/0011675800003417

Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions