KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Zhiwei Jia; Garima Pruthi; Pradyumna Narayana; Arjun R. Akula; Hao Su; Sugato Basu; Varun Jampani

Conference Proceedings

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 5 772-785

DOI: 10.18653/v1/2023.acl-industry.74

3Citations

20Readers

Get full text

Abstract

Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive general-izability and adaptability. In this paper, we perform the first empirical study of image ad understanding through the lens of pre-trained VLMs. We benchmark and reveal practical challenges in adapting these VLMs to image ad understanding. We propose a simple feature adaptation strategy to effectively fuse mul-timodal information for image ads and further empower it with knowledge of real-world entities. We hope our study draws more attention to image ad understanding which is broadly relevant to the advertising industry.

Cite

CITATION STYLE

APA

Jia, Z., Pruthi, G., Narayana, P., Akula, A. R., Su, H., Basu, S., & Jampani, V. (2023). KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 772–785). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.74

KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

Abstract

Cite

Register to see more suggestions