State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.
CITATION STYLE
Tänzer, M., Ruder, S., & Rei, M. (2022). Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 7564–7578). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.521
Mendeley helps you to discover research relevant for your work.