SocBERT: A Pretrained Model for Social Media Text

2Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Pretrained language models (PLMs) on domain-specific data have been proven to be effective for in-domain natural language processing (NLP) tasks. Our work aimed to develop a language model which can be effective for the NLP tasks with the data from diverse social media platforms. We pretrained a language model on Twitter and Reddit posts in English consisting of 929M sequence blocks for 112K steps. We benchmarked our model and 3 transformer-based models—BERT, BERTweet, and RoBERTa on 40 social media text classification tasks. The results showed that although our model did not perform the best on all of the tasks, it outperformed the baseline model—BERT on most of the tasks, which illustrates the effectiveness of our model. Also, our work provides some insights of how to improve the efficiency of training PLMs.

Cite

CITATION STYLE

APA

Guo, Y., & Sarker, A. (2023). SocBERT: A Pretrained Model for Social Media Text. In ACL 2023 - 4th Workshop on Insights from Negative Results in NLP, Proceedings (pp. 45–52). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.insights-1.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free