Memorisation versus Generalisation in Pre-trained Language Models

Michael Tänzer; Sebastian Ruder; Marek Rei

Conference ProceedingsOPEN ACCESS

Memorisation versus Generalisation in Pre-trained Language Models

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 7564-7578

DOI: 10.18653/v1/2022.acl-long.521

33Citations

73Readers

Abstract

State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Tänzer, M., Ruder, S., & Rei, M. (2022). Memorisation versus Generalisation in Pre-trained Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 7564–7578). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.521

Readers' Seniority

PhD / Post grad / Masters / Doc 13

45%

Researcher 10

34%

Professor / Associate Prof. 3

10%

Lecturer / Post doc 3

10%

Readers' Discipline

Computer Science 26

76%

Engineering 3

Linguistics 3

Medicine and Dentistry 2

Article Metrics

Mentions

News Mentions: 1

View details >

Memorisation versus Generalisation in Pre-trained Language Models

Abstract

References Powered by Scopus

ImageNet: A Large-Scale Hierarchical Image Database

Neural architectures for named entity recognition

Universal language model fine-tuning for text classification

Cited by Powered by Scopus

An Empirical Analysis of Memorization in Fine-tuned Autoregressive Language Models

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics