Toxic Comment Classification Implementing CNN Combining Word Embedding Technique

Monirul Islam Pavel; Razia Razzak; Katha Sengupta; Md Dilshad Kabir Niloy; Munim Bin Muqith; Siok Yee Tan

Conference Proceedings

Toxic Comment Classification Implementing CNN Combining Word Embedding Technique

Lecture Notes in Networks and Systems (2021) 173 LNNS 897-909

DOI: 10.1007/978-981-33-4305-4_65

5Citations

9Readers

Get full text

Abstract

With the advancement of technology, the virtual world and social media have become an important part of people’s everyday lives. Social media allows people to connect, share their emotions and discuss various subjects, yet it also becomes a place or cyberbullying, personal attack, online harassment, verbal abusing and other kinds of toxic comments. Top social media platform still suffering from fast and accurate classification to remove this kind of toxic comment automatically. In this paper, an ensemble methodology of convolution neural networking (CNN) and natural language processing (NLP) is proposed which segments toxic and non-toxic comments in first phase, and then it classifies and labels in six types based on the dataset of Wikipedia’s talk page edits, collected from Kaggle. The proposed architecture is structured following data preprocessing applying data cleaning processes, adopting NLP techniques like tokenization, stemming and converted word into vector by word embedding techniques. Ensembling the preprocessed dataset and best word embedded method, CNN model is applied that scores ROC-AUC 98.46 and 98.05% accuracy for toxic comment classification which is higher than compared existing works.

Author supplied keywords

Cite

CITATION STYLE

APA

Pavel, M. I., Razzak, R., Sengupta, K., Niloy, M. D. K., Muqith, M. B., & Tan, S. Y. (2021). Toxic Comment Classification Implementing CNN Combining Word Embedding Technique. In Lecture Notes in Networks and Systems (Vol. 173 LNNS, pp. 897–909). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-33-4305-4_65

Toxic Comment Classification Implementing CNN Combining Word Embedding Technique

Abstract

Author supplied keywords

Cite

Register to see more suggestions