A study on self-supervised sketch-based image retrieval on unpaired data (S3BIR)

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sketch-based Image Retrieval (SBIR) is a prevalent task in computer vision, where a model should produce a bimodal sketch-photo feature space. Training such a model requires sketch-photo pairs to adjust a bimodal neural network. However, accessing paired data is impractical in real scenarios, such as in the case of eCommerce search engines. To address this problem, we can leverage self-supervised learning strategies to learn the sketch-photo space, which has yet to be explored. Therefore, this work presents a study of the performance of diverse self-supervised methodologies adapted to the SBIR domain. The term self-supervised means the model cannot access sketch-photo pairs, relaxing the training to see only pseudo-sketches generated during training. So far, our study is the first to explore diverse self-supervised mechanisms for SBIR (S3BIR). Our results show the outstanding performance of contrastive-based models like SimCLR and CLIP adapted to SBIR under a self-supervised regimen. S3BIR-CLIP is the best model in terms of effectiveness, achieving a mAP of 54.03% in Flickr15K, 45.38% in eCommerce, and 13.80% in QD. In the eCommerce dataset, we increase performance by around 20% w.r.t. previously published results. However, regarding resource consumption, S3BIR-SimCLR is the most competitive.

Cite

CITATION STYLE

APA

Campos, W., Saavedra, J. M., & Stears, C. (2025). A study on self-supervised sketch-based image retrieval on unpaired data (S3BIR). Neural Computing and Applications, 37(18), 11945–11963. https://doi.org/10.1007/s00521-025-11142-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free