Abstract
In this paper, we built several pre-trained models to participate SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media. In the common task of Offensive Language Identification in Social Media, pre-trained models such as Bidirectional Encoder Representation from Transformer (BERT) have achieved good results. We preprocess the dataset by the language habits of users in social network. Considering the data imbalance in OffensEval, we screened the newly provided machine annotation samples to construct a new dataset. We use the dataset to fine-tune the Robustly Optimized BERT Pretraining Approach (RoBERTa). For the English subtask B, we adopted the method of adding Auxiliary Sentences (AS) to transform the single-sentence classification task into a relationship recognition task between sentences. Our team UJNLP wins the ranking 16th of 85 in English subtask A (Offensive language identification).
Cite
CITATION STYLE
Yao, Y., Su, N., & Ma, K. (2020). UJNLP at SemEval-2020 Task 12: Detecting Offensive Language Using Bidirectional Transformers. In 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (pp. 2203–2208). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.293
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.