Where to start? Analyzing the potential value of intermediate models

Leshem Choshen; Elad Venezian; Shachar Don-Yehiya; Noam Slonim; Yoav Katz

Conference Proceedings

Where to start? Analyzing the potential value of intermediate models

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 1446-1470

DOI: 10.18653/v1/2023.emnlp-main.90

2Citations

23Readers

Get full text

Abstract

Previous studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this intertraining scheme, over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed independently for the target dataset under consideration, and for a base model being considered as a starting point. Hence, a performant model is generally strong, even if its training data was not aligned with the target dataset. Furthermore, we leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings. Last, we release an updating ranking of best models in the HuggingFace hub per architecture.

Cite

CITATION STYLE

APA

Choshen, L., Venezian, E., Don-Yehiya, S., Slonim, N., & Katz, Y. (2023). Where to start? Analyzing the potential value of intermediate models. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1446–1470). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.90

Where to start? Analyzing the potential value of intermediate models

Abstract

Cite

Register to see more suggestions