Visually-Enhanced Phrase Understanding

1Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Large-scale vision-language pre-training has exhibited strong performance in various visual and textual understanding tasks. Recently, the textual encoders of multi-modal pre-trained models have been shown to generate high-quality textual representations, which often outperform models that are purely text-based, such as BERT. In this study, our objective is to utilize both textual and visual encoders of multi-modal pre-trained models to enhance language understanding tasks. We achieve this by generating an image associated with a textual prompt, thus enriching the representation of a phrase for downstream tasks. Results from experiments conducted on four benchmark datasets demonstrate that our proposed method, which leverages visually-enhanced text representations, significantly improves performance in the entity clustering task.

Cite

CITATION STYLE

APA

Hsu, T. Y., Li, C. A., Huang, C. W., & Chen, Y. N. (2023). Visually-Enhanced Phrase Understanding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5879–5888). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.363

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free