Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest

Mila Putri Kartika Dewi; Erwin Budi Setiawan

Journal ArticleOPEN ACCESS

Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest

Dewi M
Setiawan E

JURNAL MEDIA INFORMATIKA BUDIDARMA (2022) 6(2) 979

DOI: 10.30865/mib.v6i2.3855

N/ACitations

66Readers

Abstract

Hate speech is one of the most common cases on Twitter. It is limited to 280 characters in uploading tweets, resulting in many word variations and possible vocabulary mismatches. Therefore, this study aims to overcome these problems and build a hate speech detection system on Indonesian Twitter. This study uses 20,571 tweet data and implements the Feature Expansion method using Word2vec to overcome vocabulary mismatches. Other methods applied are Bag of Word (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF) to represent feature values in tweets. This study examines two methods in the classification process, namely Support Vector Machine (SVM) and Random Forest (RF). The final result shows that the Feature Expansion method with TF-IDF weighting in the Random Forest classification gives the best accuracy result, which is 88,37%. The Feature Expansion method with TF-IDF weighting can increase the accuracy value from several tests in detecting hate speech and overcoming vocabulary mismatches.

Cite

CITATION STYLE

APA

Dewi, M. P. K., & Setiawan, E. B. (2022). Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(2), 979. https://doi.org/10.30865/mib.v6i2.3855

Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest

Abstract

Cite

Register to see more suggestions