A Lexicon for Profane and Obscene Text Identification in Bengali

6Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Bengali is a low-resource language that lacks tools and resources for profane and obscene textual content detection. Until now, no lexicon exists for detecting obscenity in Bengali social media text. This study introduces a Bengali obscene lexicon consisting of over 200 Bengali terms that can be considered filthy, slang, profane or obscene. A semiautomatic methodology is presented for developing the obscene lexicon that leverages an obscene corpus, word embedding, and part-of-speech (POS) taggers. The developed lexicon achieves coverage of around 0.85 for obscene and profane content detection in an evaluation dataset. The experimental results imply that the developed lexicon is effective at identifying obscenity in Bengali social media content.

Cite

CITATION STYLE

APA

Sazzed, S. (2021). A Lexicon for Profane and Obscene Text Identification in Bengali. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 1289–1296). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_145

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free