There are so many information contained in the Qur'an, it will be difficult to bring up the information manually, moreover if someone wants to know more about the Qur'an. Therefore, there is a need to find information with a certain topic that already classified in the Qur'an, especially in one verse of the Qur'an may have more than one topic (multilabel). This research examined how to build classifier to classify multilabel data which is topics of Qur'anic verses with k-Nearest Neighbor method. In this research, there is a comparison between feature extraction, Weighted TF-IDF and TF-IDF. The result of that comparison is that Weigthed TF-IDF has better performance compared to normal TF-IDF. The highest result by finding the most optimal k score is k=25 with the average score of hamming loss = 0.134875. There will be a test to measure the effect of stopword removal and lemmatization with optimal k value, for a case without stopword removal, the result is 0.136375, whereas without the lemmatization the result is 0.13537. For not using stopword removal and lemmatization the average hamming loss is 0.1373125.
CITATION STYLE
Ulumudin, G. I., Adiwijaya, A., & Mubarok, M. S. (2019). A multilabel classification on topics of qur’anic verses in English translation using K-Nearest Neighbor method with Weighted TF-IDF. In Journal of Physics: Conference Series (Vol. 1192). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1192/1/012026
Mendeley helps you to discover research relevant for your work.