Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

59Citations
Citations of this article
64Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

Cite

CITATION STYLE

APA

Gaikwad, S., Ranasinghe, T., Zampieri, M., & Homan, C. M. (2021). Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 437–443). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_050

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free