Investigating Self-Supervised Methods for Label-Efficient Learning

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks, including classification, segmentation, and detection. However, the potential of these models for low-shot learning across several downstream tasks remains largely under explored. In this work, we conduct a systematic examination of different self-supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling, to assess their low-shot capabilities by comparing different pretrained models. In addition, we explore the impact of various collapse avoidance techniques, such as centring, ME-MAX, and sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework that combines mask image modelling and clustering as pretext tasks. This framework demonstrates superior performance across all examined low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on large-scale datasets, we show performance gains in various tasks.

Cite

CITATION STYLE

APA

Nandam, S. R., Atito, S., Feng, Z., Kittler, J., & Awais, M. (2025). Investigating Self-Supervised Methods for Label-Efficient Learning. International Journal of Computer Vision, 133(7), 4522–4537. https://doi.org/10.1007/s11263-025-02397-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free