A New Corpus and Lexicon for Offensive Tamazight Language Detection

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we address the offensive language detection on Tamazight language, which is one of the under-resourced languages that are still in their infancy and lack of standard orthography. We are particularly interested in the Kabyle dialect, mainly spoken in some cities of northern Algeria (i.e. Tizi-ouzou and Bejaïa). We propose a new corpus of offensive Tamazight language (i.e. OTAM corpus) compiling 6.2k texts, as well as a new lexicon of offensive and abusive Tamazight words with 12.6k entries. We have evaluated several baseline classifiers of machine learning and deep learning, where the results showed that we could produce acceptable results without features engineering.

Cite

CITATION STYLE

APA

Abainia, K., Kara, K., & Hamouni, T. (2022). A New Corpus and Lexicon for Offensive Tamazight Language Detection. In Proceedings of the 7th International Workshop on Social Media World Sensors, SIDEWAYS 2022. Association for Computing Machinery, Inc. https://doi.org/10.1145/3544795.3544852

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free