Abstract
The reliance on large-scale datasets and extensive computational resources has become a significant barrier to advancing representation learning from images, particularly in domains where data is scarce or expensive to obtain. In this paper, we address the critical question: Can we escape the big data paradigm in self-supervised representation learning from images? We introduce SCOTT ( S parse Co nvolutional T okenizer for T ransformers), a shallow tokenization architecture that is compatible with Masked Image Modeling (MIM) tasks. SCOTT injects convolutional inductive biases into Vision Transformers (ViTs), enhancing their efficacy in small-scale data regimens. Alongside, we propose to train on a Joint-Embedding Predictive Architecture within a MIM framework ( MIM-JEPA ), operating in latent representation space to capture more semantic features. Our approach enables ViTs to be trained from scratch on datasets orders of magnitude smaller than traditionally required — without relying on massive external datasets for pretraining. We validate our method on three small-size, standard-resolution, fine-grained datasets: Oxford Flowers-102, Oxford IIIT Pets-37, and ImageNet-100. Despite the challenges of limited data and high intra-class similarity of these datasets, our frozen SCOTT models pretrained with MIM-JEPA significantly outperform fully supervised methods and achieve competitive results with state-of-the-art approaches that rely on large-scale pretraining, complex image augmentations and bigger model sizes. By demonstrating that robust off-the-shelf representations can be learned with limited data, compute, and model sizes, our work paves the way for computer applications in resource constrained environments such as medical imaging or robotics.
Author supplied keywords
Cite
CITATION STYLE
Vélez-García, C., Cazorla, M., & Pomares, J. (2026). Escaping the big data paradigm in self-supervised representation learning. Computer Vision and Image Understanding, 266. https://doi.org/10.1016/j.cviu.2026.104698
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.