A self-training method for detection of phishing websites

0Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Phishing detection based on machine learning always lacks training data with high confidence labels. In order to reduce the impact of lack of labels on training set on performance to phishing detection, this paper proposes an improved self-training method of semi-supervised learning. It uses the divide-and-conquer principle and decomposes the original problem into a number of smaller but similar sub-problems to the original one. We compare model classification quality among supervised learning, traditional semi-supervised learning and new proposal method by using four classifiers, as well as the running time between two kinds of semi-supervised methods. The running time of can be reduced by 50% by using the improve method which divides unlabeled dataset equally, on the basis of ensuring the classification effect is equal to the traditional self-training method. Furthermore, the running time of model is continue reducing significantly by increasing the number of dividing unlabeled data set. The experiments results show our proposal, the improved self-training method outperformed the traditional self-training method.

Cite

CITATION STYLE

APA

Jia, X. P., & Rong, X. F. (2018). A self-training method for detection of phishing websites. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10943 LNCS, pp. 414–425). Springer Verlag. https://doi.org/10.1007/978-3-319-93803-5_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free