LMSOC: An Approach for Socially Sensitive Pretraining

12Citations
Citations of this article
48Readers
Mendeley users who have this article in their library.
Get full text

Abstract

While large-scale pretrained language models have been shown to learn effective linguistic representations for many NLP tasks, there remain many real-world contextual aspects of language that current approaches do not capture. For instance, consider a clozetest "I enjoyed the - game this weekend": the correct answer depends heavily on where the speaker is from, when the utterance occurred, and the speaker's broader social milieu and preferences. Although language depends heavily on the geographical, temporal, and other social contexts of the speaker, these elements have not been incorporated into modern transformer-based language models. We propose a simple but effective approach to incorporate speaker social context into the learned representations of large-scale language models. Our method first learns dense representations of social contexts using graph representation learning algorithms and then primes language model pretraining with these social context representations. We evaluate our approach on geographically-sensitive languagemodeling tasks and show a substantial improvement (more than 100% relative lift on MRR) compared to baselines.

Cite

CITATION STYLE

APA

Kulkarni, V., Mishra, S., & Haghighi, A. (2021). LMSOC: An Approach for Socially Sensitive Pretraining. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 2967–2975). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.254

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free