Improving Contextual Representation with Gloss Regularized Pre-training

5Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GRBERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in wordlevel and sentence-level semantic representation. The GR-BERT achieves new state-of-theart in lexical substitution task and greatly promotes BERT sentence representation in both unsupervised and supervised STS tasks.

References Powered by Scopus

Spanbert: Improving pre-training by representing and predicting spans

1327Citations
N/AReaders
Get full text

SemEval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation

510Citations
N/AReaders
Get full text

How contextual are contextualized word representations? Comparing the geometry of BERT, ELMO, and GPT-2 embeddings

501Citations
N/AReaders
Get full text

Cited by Powered by Scopus

ParaLS: Lexical Substitution via Pretrained Paraphraser

12Citations
N/AReaders
Get full text

Multilingual Lexical Simplification via Paraphrase Generation

1Citations
N/AReaders
Get full text

Reversible source-aware natural language watermarking via customized lexical substitution

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Lin, Y., An, Z., Wu, P., & Ma, Z. (2022). Improving Contextual Representation with Gloss Regularized Pre-training. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 907–920). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.68

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 8

62%

Researcher 4

31%

Lecturer / Post doc 1

8%

Readers' Discipline

Tooltip

Computer Science 12

67%

Linguistics 4

22%

Neuroscience 1

6%

Engineering 1

6%

Save time finding and organizing research with Mendeley

Sign up for free