Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling

Gábor Berend

Conference ProceedingsOPEN ACCESS

Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling

Berend G

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 13949-13962

DOI: 10.18653/v1/2023.findings-acl.876

2Citations

8Readers

Abstract

In this paper, we propose an alternative to the classic masked language modeling (MLM) pre-training paradigm, where we modify the objective from the reconstruction of the exact identity of randomly selected masked subwords to the prediction of their latent semantic properties. We coin the proposed pre-training technique masked latent semantic modeling (MLSM for short). In order to make the contextualized determination of the latent semantic properties of the masked subwords possible, we rely on an unsupervised technique using sparse coding. Our experimental results reveal that the fine-tuned performance of those models that we pre-trained via MLSM is consistently and significantly better compared to the use of vanilla MLM pre-training and other strong baselines.

Cite

CITATION STYLE

APA

Berend, G. (2023). Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 13949–13962). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.876

Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling

Abstract

Cite

Register to see more suggestions