TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning

Haoquan Li; Laoming Zhang; Daoan Zhang; Lang Fu; Peng Yang; Jianguo Zhang

Conference Proceedings

TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13680 LNCS 524-540

DOI: 10.1007/978-3-031-20044-1_30

N/ACitations

10Readers

Get full text

Abstract

This paper presents a transformer framework for few-shot learning, termed TransVLAD, with one focus showing the power of locally aggregated descriptors for few-shot learning. Our TransVLAD model is simple: a standard transformer encoder following a NeXtVLAD aggregation module to output the locally aggregated descriptors. In contrast to the prevailing use of CNN as part of the feature extractor, we are the first to prove self-supervised learning like masked autoencoders (MAE) can deal with the overfitting of transformers in few-shot image classification. Besides, few-shot learning can benefit from this general-purpose pre-training. Then, we propose two methods to mitigate few-shot biases, supervision bias and simple-characteristic bias. The first method is introducing masking operation into fine-tuning, by which we accelerate fine-tuning (by more than 3x) and improve accuracy. The second one is adapting focal loss into soft focal loss to focus on hard characteristics learning. Our TransVLAD finally tops 10 benchmarks on five popular few-shot datasets by an average of more than 2%.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, H., Zhang, L., Zhang, D., Fu, L., Yang, P., & Zhang, J. (2022). TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13680 LNCS, pp. 524–540). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20044-1_30

TransVLAD: Focusing on Locally Aggregated Descriptors for Few-Shot Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions