Modular and Parameter-Efficient Multimodal Fusion with Prompting

16Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

Recent research has made impressive progress in large-scale multimodal pre-training. In the context of the rapid growth of model size, it is necessary to seek efficient and flexible methods other than finetuning. In this paper, we propose to use prompt vectors to align the modalities. Our method achieves comparable performance to several other multimodal fusion methods in low-resource settings. We further show that our method is modular and parameter-efficient for processing tasks involving two or more data modalities.

Cite

CITATION STYLE

APA

Liang, S., Zhao, M., & Schütze, H. (2022). Modular and Parameter-Efficient Multimodal Fusion with Prompting. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2976–2985). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.234

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free