Abstract
In this paper, we address the offensive language detection on Tamazight language, which is one of the under-resourced languages that are still in their infancy and lack of standard orthography. We are particularly interested in the Kabyle dialect, mainly spoken in some cities of northern Algeria (i.e. Tizi-ouzou and Bejaïa). We propose a new corpus of offensive Tamazight language (i.e. OTAM corpus) compiling 6.2k texts, as well as a new lexicon of offensive and abusive Tamazight words with 12.6k entries. We have evaluated several baseline classifiers of machine learning and deep learning, where the results showed that we could produce acceptable results without features engineering.
Author supplied keywords
Cite
CITATION STYLE
Abainia, K., Kara, K., & Hamouni, T. (2022). A New Corpus and Lexicon for Offensive Tamazight Language Detection. In Proceedings of the 7th International Workshop on Social Media World Sensors, SIDEWAYS 2022. Association for Computing Machinery, Inc. https://doi.org/10.1145/3544795.3544852
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.