A detection method for phishing web page using DOM-based Doc2Vec model

16Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

Cite

CITATION STYLE

APA

Feng, J., Zhang, Y., & Qiao, Y. (2020). A detection method for phishing web page using DOM-based Doc2Vec model. Journal of Computing and Information Technology, 28(1), 19–31. https://doi.org/10.20532/cit.2020.1004899

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free