Abstract
Background/Objectives: The objective of this work is to find the best of the classification algorithms to classify the connection records into normal or abnormal in the KDDCup20% training data set using WEKA tool. Methods/Statistical Analysis: In this work, the experiment is carried out by the application of 10 classification algorithms on the KDDCup 20% training dataset comprising of 25192 instances through an experiment type of 10-fold cross validation. The tests were configured with Paired T Tester (corrected) and the test of significance was taken as 0.05. The comparison fields Percent_correct, fmeasure, irrecall, irprecision and auc (area under roc) were taken for evaluation. Tests were also performed for ranking and summary. Findings: As per the results obtained by the Weka Experimenter with the 10 classifiers on the KDD 20% training dataset, it has been analysed that Random forest classifier works best with the comparison fields percent_correct, fmeasure and AUC (Area under ROC). Simplecart classifier ranks next to Randomforest classifier with the comparison fields percent_correct and measure. Simplecart classifier outperforms all other classifiers with respect to the comparison field irprecision. ZeroR is found to the worst classifier in terms of all the comparison fields other than irrecall. Thus it has been found that with the dataset that is taken for experiment, further detailed study could be restricted only with the five classifiers Random Forest, Simple cart, J48, Bagging and IBk. This will definitely reduce computational time and increase the efficiency of classification of the KDDCup20% data set.
Author supplied keywords
Cite
CITATION STYLE
Lakshmi, S. V., & Prabakaran, T. E. (2015). Performance analysis of multiple classifiers on KDD cup dataset using WEKA tool. Indian Journal of Science and Technology, 8(17). https://doi.org/10.17485/ijst/2015/v8i17/70324
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.