SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

David Dale; Igor Markov; Varvara Logacheva; Olga Kozlova; Nikita Semenov; Alexander Panchenko

Conference ProceedingsOPEN ACCESS

SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

SemEval 2021 - 15th International Workshop on Semantic Evaluation, Proceedings of the Workshop (2021) 927-934

DOI: 10.18653/v1/2021.semeval-1.126

6Citations

47Readers

Abstract

This work describes the participation of the Skoltech NLP group team (Sk) in the Toxic Spans Detection task at SemEval-2021. The goal of the task is to identify the most toxic fragments of a given sentence, which is a binary sequence tagging problem. We show that fine-tuning a RoBERTa model for this problem is a strong baseline. This baseline can be further improved by pre-training the RoBERTa model on a large dataset labeled for toxicity at the sentence level. While our solution scored among the top 20% participating models, it is only 2 points below the best result. This suggests the viability of our approach.

Cite

CITATION STYLE

APA

Dale, D., Markov, I., Logacheva, V., Kozlova, O., Semenov, N., & Panchenko, A. (2021). SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection. In SemEval 2021 - 15th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 927–934). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.semeval-1.126

SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

Abstract

Cite

Register to see more suggestions