Abstract
Previous studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this intertraining scheme, over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed independently for the target dataset under consideration, and for a base model being considered as a starting point. Hence, a performant model is generally strong, even if its training data was not aligned with the target dataset. Furthermore, we leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings. Last, we release an updating ranking of best models in the HuggingFace hub per architecture.
Cite
CITATION STYLE
Choshen, L., Venezian, E., Don-Yehiya, S., Slonim, N., & Katz, Y. (2023). Where to start? Analyzing the potential value of intermediate models. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1446–1470). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.90
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.