Abstract
This paper presents our system entitled 'LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in Subtask A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages, we propose a cross-lingual augmentation approach in order to enrich training data and we use multilingual BERT to obtain sentence representations. LIIR achieved rank 14/38, 18/47, 24/86, 24/54, and 25/40 in Greek, Turkish, English, Arabic, and Danish languages, respectively.
Cite
CITATION STYLE
Ghadery, E., & Moens, M. F. (2020). LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification. In 14th International Workshops on Semantic Evaluation, SemEval 2020 - co-located 28th International Conference on Computational Linguistics, COLING 2020, Proceedings (pp. 2073–2079). International Committee for Computational Linguistics. https://doi.org/10.18653/v1/2020.semeval-1.274
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.