The increasing propagation of abusive language in social media is a major concern for supplier companies and governments because of its negative social impact. A large number of methods have been developed for its automatic identification, ranging from dictionary-based methods to sophisticated deep learning approaches. A common problem in all these methods is to distinguish the offensive use of swear words from their everyday and humorous usage. To tackle this particular issue we propose an attention-based neural network architecture that captures the word n-grams importance according to their context. The obtained results in four standard collections from Twitter and Facebook are encouraging, they outperform the $$F:1$$ scores from state-of-the-art methods and allow identifying a set of inherently offensive swear words, and others in which its interpretation depends on its context.
CITATION STYLE
Jarquín-Vásquez, H. J., Montes-y-Gómez, M., & Villaseñor-Pineda, L. (2020). Not all swear words are used equal: Attention over word n-grams for abusive language identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12088 LNCS, pp. 282–292). Springer. https://doi.org/10.1007/978-3-030-49076-8_27
Mendeley helps you to discover research relevant for your work.