CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media

Sayanta Paul; Sriparna Saha; Mohammed Hasanuzzaman

Conference ProceedingsOPEN ACCESS

CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media

14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (2020) 1925-1931

DOI: 10.18653/v1/2020.semeval-1.253

0Citations

66Readers

Abstract

The SemEval-2020 Task 12 (OffensEval) challenge focuses on detection of signs of offensiveness using posts or comments over social media. This task has been organized for several languages, e.g., Arabic, Danish, English, Greek and Turkish. It has featured three related sub-tasks for English language: sub-task A was to discriminate between offensive and non-offensive posts, the focus of sub-task B was on the type of offensive content in the post and finally, in sub-task C, proposed systems had to identify the target of the offensive posts. The corpus for each of the languages is developed using the posts and comments over Twitter, a popular social media platform. We have participated in this challenge and submitted results for different languages. The current work presents different machine learning and deep learning techniques and analyzes their performance for offensiveness prediction which involves various classifiers and feature engineering schemes. The experimental analysis on the training set shows that SVM using language specific pre-trained word embedding (Fasttext) outperforms the other methods. Our system achieves a macro-averaged F1 score of 0.45 for Arabic language, 0.43 for Greek language and 0.54 for Turkish language.

Cite

CITATION STYLE

APA

Paul, S., Saha, S., & Hasanuzzaman, M. (2020). CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media. In 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (pp. 1925–1931). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.253

CyberTronics at SemEval-2020 Task 12: Multilingual Offensive Language Identification over Social Media

Abstract

Cite

Register to see more suggestions