Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

1Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Although large language models have exhibited impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semiparametric language models, which augment a smaller language model with retrieved related background knowledge, alleviate the need for storing everything into the model parameters. Although existing semi-parametric language models have demonstrated promising language modeling capabilities, it remains unclear whether they can exhibit competitive zero-shot abilities as their fully-parametric counterparts. In this work, we introduce Zemi, a semi-parametric language model for zero-shot task generalization. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train Zemi with semiparametric multitask training, which shows significant improvement compared with the parametric multitask training as proposed by T0 (Sanh et al., 2021). Specifically, during both training and inference, Zemi is equipped with a retrieval system based on the unlabeled pretraining corpus of our backbone model. To address the unique challenges from large-scale retrieval, we further propose a novel retrieval-augmentation fusion module that can effectively incorporate noisy retrieved documents. Finally, we show detailed analysis and ablation studies on the key ingredients towards building effective zero-shot semiparametric language models. Notably, our proposed ZemiLARGE model outperforms T0-3B by 16% across seven diverse evaluation tasks while being 3.8x smaller in scale.

Cite

CITATION STYLE

APA

Wang, Z., Pan, X., Yu, D., Yu, D., Chen, J., & Ji, H. (2023). Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 3978–4004). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.246

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free