Black-box Prompt Tuning for Vision-Language Model as a Service

Lang Yu; Qin Chen; Jiaju Lin; Liang He

Conference Proceedings

Black-box Prompt Tuning for Vision-Language Model as a Service

IJCAI International Joint Conference on Artificial Intelligence (2023) 2023-August 1686-1694

DOI: 10.24963/ijcai.2023/187

19Citations

7Readers

Get full text

Abstract

In the scenario of Model-as-a-Service (MaaS), pre-trained models are usually released as inference APIs. Users are allowed to query those models with manually crafted prompts. Without accessing the network structure and gradient information, it's tricky to perform continuous prompt tuning on MaaS, especially for vision-language models (VLMs) considering cross-modal interaction. In this paper, we propose a black-box prompt tuning framework for VLMs to learn task-relevant prompts without back-propagation. In particular, the vision and language prompts are jointly optimized in the intrinsic parameter subspace with various evolution strategies. Different prompt variants are also explored to enhance the cross-model interaction. Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train task-relevant prompts in a derivative-free manner.

Cite

CITATION STYLE

APA

Yu, L., Chen, Q., Lin, J., & He, L. (2023). Black-box Prompt Tuning for Vision-Language Model as a Service. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2023-August, pp. 1686–1694). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2023/187

Black-box Prompt Tuning for Vision-Language Model as a Service

Abstract

Cite

Register to see more suggestions