Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification

Mingze Sun; Weizhi Ma; Yang Liu

Conference ProceedingsOPEN ACCESS

Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification

International Conference on Information and Knowledge Management, Proceedings (2022) 4530-4534

DOI: 10.1145/3511808.3557604

11Citations

5Readers

Abstract

Image classification is a classical machine learning task and has been widely used. Due to the high costs of annotation and data collection in real scenarios, few-shot learning has become a vital technique to improve image classification performances. However, most existing few-shot image classification methods only focus on modeling the global image feature or image local patches, which ignore the global-local interactions. In this study, we propose a new method, named GL-ViT, to integrate both global and local features to fully exploit the few-shot samples for image classification. Firstly, we design a feature extractor module to calculate the interactions between the global representation and local patch embeddings, where ViT is also adopted to achieve efficient and effective image representation. Then, Earth Mover's Distance is adopted to measure the similarity between two images. Abundant Experimental results on several widely-used open datasets show that GL-ViT outperforms state-of-the-art algorithms significantly, and our ablation studies also verify the effectiveness of both global-local features.

Author supplied keywords

Cite

CITATION STYLE

APA

Sun, M., Ma, W., & Liu, Y. (2022). Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification. In International Conference on Information and Knowledge Management, Proceedings (pp. 4530–4534). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557604

Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions