Abstract
Sketch-based Image Retrieval (SBIR) is a prevalent task in computer vision, where a model should produce a bimodal sketch-photo feature space. Training such a model requires sketch-photo pairs to adjust a bimodal neural network. However, accessing paired data is impractical in real scenarios, such as in the case of eCommerce search engines. To address this problem, we can leverage self-supervised learning strategies to learn the sketch-photo space, which has yet to be explored. Therefore, this work presents a study of the performance of diverse self-supervised methodologies adapted to the SBIR domain. The term self-supervised means the model cannot access sketch-photo pairs, relaxing the training to see only pseudo-sketches generated during training. So far, our study is the first to explore diverse self-supervised mechanisms for SBIR (S3BIR). Our results show the outstanding performance of contrastive-based models like SimCLR and CLIP adapted to SBIR under a self-supervised regimen. S3BIR-CLIP is the best model in terms of effectiveness, achieving a mAP of 54.03% in Flickr15K, 45.38% in eCommerce, and 13.80% in QD. In the eCommerce dataset, we increase performance by around 20% w.r.t. previously published results. However, regarding resource consumption, S3BIR-SimCLR is the most competitive.
Author supplied keywords
Cite
CITATION STYLE
Campos, W., Saavedra, J. M., & Stears, C. (2025). A study on self-supervised sketch-based image retrieval on unpaired data (S3BIR). Neural Computing and Applications, 37(18), 11945–11963. https://doi.org/10.1007/s00521-025-11142-4
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.