Selecting the Best Supervised Learning Algorithm for Recommending the Course in E-Learning System

Sunita Baher; Lobo L.M.R.J.

Journal ArticleOPEN ACCESS

Selecting the Best Supervised Learning Algorithm for Recommending the Course in E-Learning System

Baher S
L.M.R.J. L

International Journal of Computer Applications (2012) 41(5) 42-49

DOI: 10.5120/5541-7597

N/ACitations

9Readers

Abstract

conducted experimental comparison of LibSVMs, C4.5, BaggingC4.5, AdaBoostingC4.5, and Random Forest on seven Microarray cancer data sets. The experimental results showed that all ensemble methods outperformed C4.5. The experimental results also showed that all five methods benefited from data preprocessing, including gene selection and discretization, in classification accuracy. In addition to comparing the average accuracies of ten-fold cross validation tests on seven data sets, they used two statistical tests to validate findings. Abdelghani Bellaachia, Erhan Guven [2] presented an analysis of the prediction of survivability rate of breast cancer patients using data mining techniques. The data used was the SEER Public-Use Data. The preprocessed data set consists of 151,886 records, which had all the available 16 fields from the SEER database. They have investigated three data mining techniques: the Naïve Bayes, the back-propagated neural network, and the C4.5 decision tree algorithms. Several experiments were conducted using these algorithms. The achieved prediction performances were comparable to existing techniques. However, they found out that C4.5 algorithm has a much better performance than the other two techniques. My Chau Tu, Dongil Shin, Dongkyoo Shin [3] proposed the use of decision tree C4.5 algorithm, bagging with decision tree C4.5 algorithm and bagging with Naïve Bayes algorithm to identify the heart disease of a patient and compare the effectiveness, correction rate among them. The data they studied was collected from patients with coronary artery disease. Aman Kumar Sharma, Suruchi Sahni [4] conducted experiment in the WEKA environment by using four algorithms namely ID3, J48, Simple CART and Alternating Decision Tree on the spam email dataset and later the four algorithms were compared in terms of classification accuracy. According to their simulation results, the J48 classifier outperforms the ID3, CART and ADTree in terms of classification accuracy. Rich Caruana Alexandru Niculescu-Mizil [5] presented a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. They also examined the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of their study was the use of a variety of performance criteria to evaluate the learning methods. Eric Bauer, Ron Kohavi [6] provided a brief review of two families of voting algorithms: perturb and combine (e.g., Bagging), and boosting (e.g., AdaBoost, Arc-x4). In research [7], twenty-two decision tree, nine statistical, and two neural network algorithms were compared on thirty-two datasets in terms of classification accuracy, training time, and (in the case of trees) number of leaves. Classification accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, spline-based, algorithm called Polyclass at the top, although it is not statistically significantly different from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fifth, respectively. Although spline-based statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algorithms are substantially faster. Among decision tree algorithms with univariate splits, C4.5, Ind-Cart, and Quest have the best combinations of error rate and speed. But C4.5 tends to produce trees with twice as many leaves as those from Ind-Cart and Quest. 3. SUPERVISED CLASSIFICATION ALGORITHMS

Cite

CITATION STYLE

APA

Baher, S., & L.M.R.J., L. (2012). Selecting the Best Supervised Learning Algorithm for Recommending the Course in E-Learning System. International Journal of Computer Applications, 41(5), 42–49. https://doi.org/10.5120/5541-7597

Selecting the Best Supervised Learning Algorithm for Recommending the Course in E-Learning System

Abstract

Cite

Register to see more suggestions