SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection

4Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

This work describes the participation of the Skoltech NLP group team (Sk) in the Toxic Spans Detection task at SemEval-2021. The goal of the task is to identify the most toxic fragments of a given sentence, which is a binary sequence tagging problem. We show that fine-tuning a RoBERTa model for this problem is a strong baseline. This baseline can be further improved by pre-training the RoBERTa model on a large dataset labeled for toxicity at the sentence level. While our solution scored among the top 20% participating models, it is only 2 points below the best result. This suggests the viability of our approach.

Cite

CITATION STYLE

APA

Dale, D., Markov, I., Logacheva, V., Kozlova, O., Semenov, N., & Panchenko, A. (2021). SkoltechNLP at SemEval-2021 Task 5: Leveraging Sentence-level Pre-training for Toxic Span Detection. In SemEval 2021 - 15th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 927–934). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.semeval-1.126

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free