Human Language Modeling

16Citations
Citations of this article
52Readers
Mendeley users who have this article in their library.

Abstract

Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for the HULM task, pre-trained on approximately 100,000 social media users, and demonstrate it's effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels: stance detection, sentiment classification, age estimation, and personality assessment. Results on all tasks meet or surpass the current state-of-the-art.

Cite

CITATION STYLE

APA

Soni, N., Matero, M., Balasubramanian, N., & Schwartz, H. A. (2022). Human Language Modeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 622–636). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.52

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free