Memorisation versus Generalisation in Pre-trained Language Models

33Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.

References Powered by Scopus

ImageNet: A Large-Scale Hierarchical Image Database

50783Citations
N/AReaders
Get full text

Neural architectures for named entity recognition

2583Citations
N/AReaders
Get full text

Universal language model fine-tuning for text classification

1846Citations
N/AReaders
Get full text

Cited by Powered by Scopus

An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models

32Citations
N/AReaders
Get full text

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

16Citations
N/AReaders
Get full text

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

10Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tänzer, M., Ruder, S., & Rei, M. (2022). Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 7564–7578). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.521

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 13

45%

Researcher 10

34%

Professor / Associate Prof. 3

10%

Lecturer / Post doc 3

10%

Readers' Discipline

Tooltip

Computer Science 26

76%

Engineering 3

9%

Linguistics 3

9%

Medicine and Dentistry 2

6%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free