Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy

Anbuselvan Sangodiah; Tham Jee San; Yong Tien Fui; Lim Ean Heng; Ramesh Kumar Ayyasamy; Norazira Binti A. Jalil

Journal ArticleOPEN ACCESS

Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy

Mendel (2022) 28(1) 8-22

DOI: 10.13164/mendel.2022.1.008

17Citations

35Readers

Abstract

Examination is one of the common ways to evaluate the students’ cognitive levels in higher education institutions. Exam questions are labeled manually by educators in accordance to Bloom’s taxonomy cognitive domain. To ease the burden of the educators, several past research works have proposed the automated question classification based on Bloom’s taxonomy using the machine learning technique. Feature selection, feature extraction and term weighting are common ways to improve the accuracy of question classification. Commonly used term weighting method in the past work is unsupervised namely TF and TF-IDF. There are several variants of TF and TFIDF and the most optimal variant has yet to be identified in the context of question classification based on BT. Therefore, this paper aims to study the TF, TF-IDF and normalized TF-IDF variants and to identify the optimal variants that can be used as baseline term weighting scheme. To investigate the variants, two different classifiers were used, which are Support Vector Machine (SVM) and Naïve Bayes. The average accuracies achieved by TF-IDF and normalized TF-IDF variants using SVM classifier were 63.7% and 71.7% respectively, while using Naïve Bayes classifier the average accuracies for TF-IDF and normalized TF-IDF were 62.4% and 63.4% respectively. Generally, the normalized TF-IDF variants outperformed TF and TF-IDF variants in both accuracy and F1-measure respectively. Further statistical analysis using t-test shows that the differences in accuracy between normalized TF-IDF and TF, TF-IDF are significant. According to the results of this study, the Normalized TF-IDF2 variant had the greatest accuracy of 73.3% among normalized TF-IDF variants, whereas the TF-IDF3 variant had the highest accuracy of 70.8% among unnormalized TF-IDF variants. As a result, the normalized TF-IDF2 and unnormalized TF-IDF3 variations are useful for benchmarking and comparing with other term weighting techniques in question classification based on BT in future research.

Author supplied keywords

Cite

CITATION STYLE

APA

Sangodiah, A., San, T. J., Fui, Y. T., Heng, L. E., Ayyasamy, R. K., & Jalil, N. B. A. (2022). Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy. Mendel, 28(1), 8–22. https://doi.org/10.13164/mendel.2022.1.008

Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy

Abstract

Author supplied keywords

Cite

Register to see more suggestions