Human Language Modeling

Nikita Soni; Matthew Matero; Niranjan Balasubramanian; H. Andrew Schwartz

Conference ProceedingsOPEN ACCESS

Human Language Modeling

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 622-636

DOI: 10.18653/v1/2022.findings-acl.52

16Citations

52Readers

Abstract

Natural language is generated by people, yet traditional language modeling views words or documents as if generated independently. Here, we propose human language modeling (HuLM), a hierarchical extension to the language modeling problem whereby a human-level exists to connect sequences of documents (e.g. social media messages) and capture the notion that human language is moderated by changing human states. We introduce, HaRT, a large-scale transformer model for the HULM task, pre-trained on approximately 100,000 social media users, and demonstrate it's effectiveness in terms of both language modeling (perplexity) for social media and fine-tuning for 4 downstream tasks spanning document- and user-levels: stance detection, sentiment classification, age estimation, and personality assessment. Results on all tasks meet or surpass the current state-of-the-art.

Cite

CITATION STYLE

APA

Soni, N., Matero, M., Balasubramanian, N., & Schwartz, H. A. (2022). Human Language Modeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 622–636). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-acl.52

Human Language Modeling

Abstract

Cite

Register to see more suggestions