MaChAmp at SemEval-2023 tasks 2, 3, 4, 5, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Intermediate Training on an Uncurated Collection of Datasets

7Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

To improve the ability of language models to handle Natural Language Processing (NLP) tasks and intermediate step of pre-training has recently been introduced. In this setup, one takes a pre-trained language model, trains it on a (set of) NLP dataset(s), and then finetunes it for a target task. It is known that the selection of relevant transfer tasks is important, but recently some work has shown substantial performance gains by doing intermediate training on a very large set of datasets. Most previous work uses generative language models or only focuses on one or a couple of tasks and uses a carefully curated setup. We compare intermediate training with one or many tasks in a setup where the choice of datasets is more arbitrary; we use all SemEval 2023 text-based tasks. We reach performance improvements for most tasks when using intermediate training. Gains are higher when doing intermediate training on single tasks than all tasks if the right transfer task is identified. Dataset smoothing and heterogeneous batching did not lead to robust gains in our setup.

Cite

CITATION STYLE

APA

van der Goot, R. (2023). MaChAmp at SemEval-2023 tasks 2, 3, 4, 5, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Intermediate Training on an Uncurated Collection of Datasets. In 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop (pp. 230–245). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.semeval-1.32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free