Visual Instruction Tuning with Polite Flamingo

Delong Chen; Jianfeng Liu; Wenliang Dai; Baoyuan Wang

Conference ProceedingsOPEN ACCESS

Visual Instruction Tuning with Polite Flamingo

Proceedings of the AAAI Conference on Artificial Intelligence (2024) 38(16) 17745-17753

DOI: 10.1609/aaai.v38i16.29727

3Citations

33Readers

Abstract

Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the “multi-modal alignment tax”, surfaces. This side effect negatively impacts the model’s ability to format responses appropriately - for instance, its “politeness” - due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo, a multimodal response rewriter that transforms raw annotations into a more appealing, “polite” format. Polite Flamingo is trained to reconstruct high-quality responses from their automatically distorted counterparts and is subsequently applied to a vast array of vision-language datasets for response rewriting. After rigorous filtering, we generate the PF-1M dataset and further validate its value by fine-tuning a multi-modal LLM with it. Combined with novel methodologies including U-shaped multi-stage tuning and multi-turn augmentation, the resulting model, Clever Flamingo, demonstrates its advantages in both multi-modal understanding and response politeness according to automated and human evaluations. Code and dataset are available at https://github.com/ChenDelong1999/politeflamingo

Cite

CITATION STYLE

APA

Chen, D., Liu, J., Dai, W., & Wang, B. (2024). Visual Instruction Tuning with Polite Flamingo. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 17745–17753). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i16.29727

Visual Instruction Tuning with Polite Flamingo

Abstract

Cite

Register to see more suggestions