Toxic Comment Classification Implementing CNN Combining Word Embedding Technique

5Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the advancement of technology, the virtual world and social media have become an important part of people’s everyday lives. Social media allows people to connect, share their emotions and discuss various subjects, yet it also becomes a place or cyberbullying, personal attack, online harassment, verbal abusing and other kinds of toxic comments. Top social media platform still suffering from fast and accurate classification to remove this kind of toxic comment automatically. In this paper, an ensemble methodology of convolution neural networking (CNN) and natural language processing (NLP) is proposed which segments toxic and non-toxic comments in first phase, and then it classifies and labels in six types based on the dataset of Wikipedia’s talk page edits, collected from Kaggle. The proposed architecture is structured following data preprocessing applying data cleaning processes, adopting NLP techniques like tokenization, stemming and converted word into vector by word embedding techniques. Ensembling the preprocessed dataset and best word embedded method, CNN model is applied that scores ROC-AUC 98.46 and 98.05% accuracy for toxic comment classification which is higher than compared existing works.

Cite

CITATION STYLE

APA

Pavel, M. I., Razzak, R., Sengupta, K., Niloy, M. D. K., Muqith, M. B., & Tan, S. Y. (2021). Toxic Comment Classification Implementing CNN Combining Word Embedding Technique. In Lecture Notes in Networks and Systems (Vol. 173 LNNS, pp. 897–909). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-33-4305-4_65

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free