Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction

2Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset. More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) (Ribeiro et al., 2016a) and Integrated Gradients (IG) (Sundararajan et al., 2017) for adapting transformer based offensive language classification models for zero-shot offensive span identification. To this end, we find that LIME and IG show baseline F1 of 26.35% and 44.83%, respectively. Besides, we study the effect of data set size and training process on the overall accuracy of span identification. As a result, we find both LIME and IG to show significant improvement with Masked Data Augmentation and Multilabel Training, with F1 of 50.23% and 47.38% respectively. Disclaimer: This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead they are used to emphasize only the linguistic research challenges.

Cite

CITATION STYLE

APA

Ravikiran, M., & Chakravarthi, B. R. (2022). Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction. In DravidianLangTech 2022 - 2nd Workshop on Speech and Language Technologies for Dravidian Languages, Proceedings of the Workshop (pp. 240–247). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.dravidianlangtech-1.37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free