Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM)

  • Marthasari G
  • Hayatin N
  • Yuniarti M
N/ACitations
Citations of this article
42Readers
Mendeley users who have this article in their library.

Abstract

The diversity of the content of a web page can have a negative impact if used by the wrong user. Almost a half of internet users are children. Therefore, it is important to classify web pages to find out which pages are worthy of being seen by children and that are not feasible. One method that can be used is the Support Vector Machine (SVM) algorithm. SVM is a binary classification whose working principle is to find the best hyperplane to separate the two classes. To obtain better classification accuracy, the SVM is combined with the Latent Semantic Analysis (LSA) algorithm. The data used in this study were taken from the DMOZ web data which has been classified into two categories. The data is then entered into the pre-processing stage for further feature extraction using LSA. The LSA algorithm is used to find out the semantic similarities of words and text contained in web pages. The results of feature extraction are then classified using SVM with RBF kernel. Based on the testing result, we obtain a classification accuracy of 64%.

Cite

CITATION STYLE

APA

Marthasari, G. I., Hayatin, N., & Yuniarti, M. (2022). Content Classification based-on Latent Semantic Analysis and Support Vector Machine (LSA-SVM). Jurnal Transformatika, 19(2), 144–150. https://doi.org/10.26623/transformatika.v19i2.2745

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free