Recent advancements in large language models have enabled them to perform well on complex tasks that require step-by-step reasoning with few-shot learning. However, it is unclear whether these models are applying reasoning skills they have learned during pre-training, or if they are simply memorizing their training corpus at finer granularity and have learned to better understand their context. To address this question, we introduce ALERT, a benchmark and suite of analyses for evaluating reasoning skills of language models. ALERT enables comparing pre-trained and finetuned models on complex tasks that require reasoning skills to solve them. Our benchmark provides a test bed to assess any language model on fine-grained reasoning skills, which spans over 20 datasets and covers 10 different reasoning skills. To prove the efficacy of ALERT we investigate the role of finetuning. Our extensive empirical analysis shows that language models acquire reasoning skills such as textual entailment, abductive reasoning, and analogical reasoning during the finetuning stage compared to pretraining stage. Another finding is when language models are finetuned they tend to overfit to the prompt template, which hurts the robustness of models resulting in generalization problems.
CITATION STYLE
Yu, P., Wang, T., Golovneva, O., AlKhamissi, B., Verma, S., Jin, Z., … Celikyilmaz, A. (2023). ALERT: Adapting Language Models to Reasoning Tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 1055–1081). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.60
Mendeley helps you to discover research relevant for your work.