Black-box Prompt Tuning for Vision-Language Model as a Service

19Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it's tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.

Cite

CITATION STYLE

APA

Yu, L., Chen, Q., Lin, J., & He, L. (2023). Black-box Prompt Tuning for Vision-Language Model as a Service. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2023-August, pp. 1686–1694). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2023/187

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free