According to WHO, around 71 million people were infected with the Hepatitis C virus in 2019. However, only 49.7% of people are aware of Hepatitis C. Early prevention is essential to minimize the possibility of something terrible. To maximize the efforts of medical experts in minimizing the risk of transmission, a program was created that is capable of classifying Hepatitis C with an automatic detection system using a machine learning model. Random Forest was chosen because it can handle outlier and imbalance data so that it can produce high accuracy values and can identify important features. Naïve Bayes was chosen because of its simple algorithm, but capable of producing high-accuracy values. After testing both models, the confusion matrix formula calculates the prediction results. The test results show that applying the Random Forest model without SMOTE is 93%, and Naïve Bayes without SMOTE is 88%. Due to the data imbalance in the dataset, an oversampling technique is performed using the SMOTE method. The test results were obtained by applying the Random Forest model with a SMOTE of 98% and Naïve Bayes with a SMOTE of 89%. Abstrak Menurut WHO, orang yang terinfeksi virus Hepatitis C tercatat sekitar 71 juta orang pada 2019. Hanya 49,7% orang yang menyadari adanya penyakit Hepatitis C. Pencegahan dini penting dilakukan untuk meminimalisir kemungkinan buruk terjadi. Untuk memaksimalkan upaya ahli medis dalam meminimalisir risiko penularan, dibuat program yang mampu mengklasifikasikan penyakit Hepatitis C dengan sistem deteksi otomatis menggunakan model machine learning. Random Forest dipilih karena mampu menangani outlier dan imbalance data sehingga mampu menghasilkan nilai akurasi yang tinggi serta mampu mengidentifikasi fitur-fitur yang penting. Naïve Bayes dipilih karena algoritmanya yang sederhana, namun mampu menghasilkan nilai akurasi tinggi. Setelah dilakukan pengujian pada kedua model, dilakukan perhitungan terhadap hasil prediksi menggunakan formula confusion matrix. Hasil pengujian menunjukkan dengan menerapkan model Random Forest tanpa SMOTE sebesar 93% dan Naïve Bayes tanpa SMOTE sebesar 88%. Sehubungan dengan adanya imbalance data pada dataset, maka dilakukan teknik oversampling menggunakan metode SMOTE. Hasil pengujian yang diperoleh dari menerapkan model Random Forest dengan SMOTE sebesar 98% dan Naïve Bayes dengan SMOTE sebesar 89%. Kata kunci: hepatitis c; random forest; naïve bayes; SMOTE; confusion matrix.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Sharfina, N., & Ramadhan, N. G. (2023). Analisis SMOTE Pada Klasifikasi Hepatitis C Berbasis Random Forest dan Naïve Bayes. JOINTECS (Journal of Information Technology and Computer Science), 8(1), 33. https://doi.org/10.31328/jointecs.v8i1.4456