FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning

5Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Large pre-trained models are capable of few-shot in-context learning (ICL), i.e., performing a new task by prepending a few demonstrations before the test input. However, the concatenated demonstrations are often excessively long and induce additional computation. Inspired by fusion-in-decoder (FiD) models which efficiently aggregate more passages and thus outperforms concatenation-based models in open-domain QA, we hypothesize that similar techniques can be applied to improve the efficiency and end-task performance of ICL. To verify this, we present a comprehensive study on applying three fusion methods-concatenation-based (early fusion), FiD (intermediate), and ensemble-based (late)-to ICL. We adopt a meta-learning setup where a model is first trained to perform ICL on a mixture of tasks using one selected fusion method, then evaluated on held-out tasks for ICL. Results on 11 held-out tasks show that FiD-ICL matches or outperforms the other two fusion methods. Additionally, we show that FiD-ICL (1) is 10x faster at inference time compared to concat-based and ensemble-based ICL, as we can easily precompute the representations of in-context examples and reuse them; (2) enables scaling up to meta-training 3B-sized models, which would fail for concat-based ICL.

Cite

CITATION STYLE

APA

Ye, Q., Beltagy, I., Peters, M. E., Ren, X., & Hajishirzi, H. (2023). FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 8158–8185). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.454

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free