Indonesian Hate Speech Text Classification Using Improved K-Nearest Neighbor with TF-IDF-ICSρF

  • Saputra N
  • Aeni K
  • Saraswati N
N/ACitations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

Purpose: Freedom in social media gives rise to the possibility of disturbing users through the sentences they send, which is limited by the Electronic Information and Transactions Law (UU ITE). This research aims to find an effective method for classifying hate speech text data, especially in Indonesian, with many categories expected to minimize this case.Methods: This study used 1.000 data from Twitter with five labels, including religion, race, physical, gender and other (invective or slander). The process started with several steps of preprocessing, data transformation using TF-IDF-ICSρF term weighting and data mining using an Improved KNN algorithm. Then, the results were compared with the TF-IDF and KNN methods to evaluate the differences.Result: Using TF-IDF-ICSρF and Improved KNN algorithms gets an average accuracy value of 88.11%, 17.81% higher compared with the same data and parameters to the K-Nearest Neighbor and TF-IDF algorithms, which get results of 70.30%.Novelty: Based on the comparison results, TF-IDF-ICSρF and Improved KNN methods can effectively classify hate speech sentences that have many labels with fairly good accuracy.

Cite

CITATION STYLE

APA

Saputra, N. A., Aeni, K., & Saraswati, N. M. (2024). Indonesian Hate Speech Text Classification Using Improved K-Nearest Neighbor with TF-IDF-ICSρF. Scientific Journal of Informatics, 11(1), 21–30. https://doi.org/10.15294/sji.v11i1.48085

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free