Hate speech is one of the most common cases on Twitter. It is limited to 280 characters in uploading tweets, resulting in many word variations and possible vocabulary mismatches. Therefore, this study aims to overcome these problems and build a hate speech detection system on Indonesian Twitter. This study uses 20,571 tweet data and implements the Feature Expansion method using Word2vec to overcome vocabulary mismatches. Other methods applied are Bag of Word (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF) to represent feature values in tweets. This study examines two methods in the classification process, namely Support Vector Machine (SVM) and Random Forest (RF). The final result shows that the Feature Expansion method with TF-IDF weighting in the Random Forest classification gives the best accuracy result, which is 88,37%. The Feature Expansion method with TF-IDF weighting can increase the accuracy value from several tests in detecting hate speech and overcoming vocabulary mismatches.
CITATION STYLE
Dewi, M. P. K., & Setiawan, E. B. (2022). Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(2), 979. https://doi.org/10.30865/mib.v6i2.3855
Mendeley helps you to discover research relevant for your work.