SocBERT: A Pretrained Model for Social Media Text

Yuting Guo; Abeed Sarker

Conference ProceedingsOPEN ACCESS

SocBERT: A Pretrained Model for Social Media Text

ACL 2023 - 4th Workshop on Insights from Negative Results in NLP, Proceedings (2023) 45-52

DOI: 10.18653/v1/2023.insights-1.5

2Citations

15Readers

Abstract

Pretrained language models (PLMs) on domain-specific data have been proven to be effective for in-domain natural language processing (NLP) tasks. Our work aimed to develop a language model which can be effective for the NLP tasks with the data from diverse social media platforms. We pretrained a language model on Twitter and Reddit posts in English consisting of 929M sequence blocks for 112K steps. We benchmarked our model and 3 transformer-based models—BERT, BERTweet, and RoBERTa on 40 social media text classification tasks. The results showed that although our model did not perform the best on all of the tasks, it outperformed the baseline model—BERT on most of the tasks, which illustrates the effectiveness of our model. Also, our work provides some insights of how to improve the efficiency of training PLMs.

Cite

CITATION STYLE

APA

Guo, Y., & Sarker, A. (2023). SocBERT: A Pretrained Model for Social Media Text. In ACL 2023 - 4th Workshop on Insights from Negative Results in NLP, Proceedings (pp. 45–52). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.insights-1.5

SocBERT: A Pretrained Model for Social Media Text

Abstract

Cite

Register to see more suggestions