Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

Gita Indah Marthasari; Nur Hayatin; Maulidya Yuniarti

Journal ArticleOPEN ACCESS

Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

Marthasari G
Hayatin N
Yuniarti M

Jurnal Transformatika (2022) 19(2) 144-150

DOI: 10.26623/transformatika.v19i2.2745

N/ACitations

42Readers

Abstract

The diversity of the content of a web page can have a negative impact if used by the wrong user. Almost a half of internet users are children. Therefore, it is important to classify web pages to find out which pages are worthy of being seen by children and that are not feasible. One method that can be used is the Support Vector Machine (SVM) algorithm. SVM is a binary classification whose working principle is to find the best hyperplane to separate the two classes. To obtain better classification accuracy, the SVM is combined with the Latent Semantic Analysis (LSA) algorithm. The data used in this study were taken from the DMOZ web data which has been classified into two categories. The data is then entered into the pre-processing stage for further feature extraction using LSA. The LSA algorithm is used to find out the semantic similarities of words and text contained in web pages. The results of feature extraction are then classified using SVM with RBF kernel. Based on the testing result, we obtain a classification accuracy of 64%.

Cite

CITATION STYLE

APA

Marthasari, G. I., Hayatin, N., & Yuniarti, M. (2022). Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM). Jurnal Transformatika, 19(2), 144–150. https://doi.org/10.26623/transformatika.v19i2.2745

Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

Abstract

Cite

Register to see more suggestions