Classification of Imbalanced Offensive Dataset – Sentence Generation for Minority Class with LSTM

Ekin Ekinci

Journal ArticleOPEN ACCESS

Classification of Imbalanced Offensive Dataset – Sentence Generation for Minority Class with LSTM

Ekinci E

Sakarya University Journal of Computer and Information Sciences (2022) 5(1) 121-133

DOI: 10.35377/saucis...1070822

4Citations

8Readers

Abstract

The classification of documents is one of the problems studied since ancient times and still continues to be studied. With social media becoming a part of daily life and its misuse, the importance of text classification has started to increase. This paper investigates the effect of data augmentation with sentence generation on classification performance in an imbalanced dataset. We propose an LSTM based sentence generation method, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2vec and apply Logistic Regression (LR), Support Vector Machine (SVM), K Nearest Neighbor (KNN), Multilayer Perceptron (MLP), Extremely Randomized Trees (Extra tree), Random Forest, eXtreme Gradient Boosting (Xgboost), Adaptive Boosting (AdaBoost) and Bagging. Our experiment results on an imbalanced Offensive Language Identification Dataset (OLID) that machine learning with sentence generation significantly outperforms.

Author supplied keywords

Cite

CITATION STYLE

APA

Ekinci, E. (2022). Classification of Imbalanced Offensive Dataset – Sentence Generation for Minority Class with LSTM. Sakarya University Journal of Computer and Information Sciences, 5(1), 121–133. https://doi.org/10.35377/saucis...1070822

Classification of Imbalanced Offensive Dataset – Sentence Generation for Minority Class with LSTM

Abstract

Author supplied keywords

Cite

Register to see more suggestions