Visually-augmented Pretrained Language Models for NLP Tasks without Images

6Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although pre-trained language models (PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel Visually-Augmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, Without using any retrieved or generated Images, namely VAWI. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks. Our codes and data are publicly available at https://github.com/RUCAIBox/VAWI.

Cite

CITATION STYLE

APA

Guo, H., Zhou, K., Zhao, W. X., Zhang, Q., & Wen, J. R. (2023). Visually-augmented Pretrained Language Models for NLP Tasks without Images. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 14912–14929). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.833

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free