nlpUP at SemEval-2020 Task 12: A Blazing Fast System for Offensive Language Detection

Ehab Hamdy; Jelena Mitrović; Michael Granitzer

Conference Proceedings

nlpUP at SemEval-2020 Task 12: A Blazing Fast System for Offensive Language Detection

14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (2020) 2098-2104

DOI: 10.18653/v1/2020.semeval-1.278

1Citations

67Readers

Get full text

Abstract

In this paper, we introduce our submission for the SemEval Task 12, sub-tasks A and B for offensive language identification and categorization in English tweets. This year the dataset for Task A is significantly larger than in the previous year. Therefore, we have adapted the BlazingText algorithm to extract embedding representation and classify texts after filtering and sanitizing the dataset according to the conventional text patterns on social media. We have gained both advantages of a speedy training process and obtained a good F1 score of 90.88% on the test set. For sub-task B, we opted to fine-tune a Bidirectional Encoder Representation from a Transformer (BERT) to accommodate the limited data for categorizing offensive tweets. We have achieved an F1 score of only 56.86%, but after experimenting with various label assignment thresholds in the pre-processing steps, the F1 score improved to 64%.

Cite

CITATION STYLE

APA

Hamdy, E., Mitrović, J., & Granitzer, M. (2020). nlpUP at SemEval-2020 Task 12: A Blazing Fast System for Offensive Language Detection. In 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (pp. 2098–2104). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.278

nlpUP at SemEval-2020 Task 12: A Blazing Fast System for Offensive Language Detection

Abstract

Cite

Register to see more suggestions