On Determining the Most Effective Subset of Features for Detecting Phishing Websites

  • Hassan D
N/ACitations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Phishing websites are a form of mimicking the legitimate ones for the purpose of stealing user 's confidential information such as usernames, passwords and credit card information. Recently machine learning and data mining techniques have been a promising approach for detection of phishing websites by distinguishing between phishing and legitimate ones. The detection process in this approach is preceded by extracting various features from a website dataset to train the classifier to correctly identify phishing sites. However, not all extracted features are effective in classification or equivalent in their contribution to its performance. In this paper, we investigate the effect of feature selection on the performance of classification for predicting phishing sites. We evaluate various machine learning algorithms using a number of feature subsets selected from an extracted feature set by various feature selection techniques in order to determine the most effective subset of features that results in best classification performance. Empirical results shows that using our new proposed methodology for selecting features by removing redundant ones that equally contribute to the classification accuracy, the decision tree classifier achieves the best performance with an overall accuracy of 95.40%, false positive rate (FPR) of 0.046 and false negative rate (FNR) of 0.065.

Cite

CITATION STYLE

APA

Hassan, D. (2015). On Determining the Most Effective Subset of Features for Detecting Phishing Websites. International Journal of Computer Applications, 122(20), 1–7. https://doi.org/10.5120/21813-5191

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free