Retrieving Multimodal Information for Augmented Generation: A Survey

30Citations
Citations of this article
120Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

Cite

CITATION STYLE

APA

Zhao, R., Chen, H., Wang, W., Jiao, F., Do, X. L., Qin, C., … Joty, S. (2023). Retrieving Multimodal Information for Augmented Generation: A Survey. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4736–4756). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.314

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free