PhishRepo: A Seamless Collection of Phishing Data to Fill a Research Gap in the Phishing Domain

1Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Machine learning-based anti-phishing solutions face various challenges in collecting diverse multi-modal phishing data. As a result, most previous works have trained with little or no multi-modal data, which opens several drawbacks. Therefore, this study aims to develop a phishing data repository to meet the diverse data needs of the anti-phishing domain. As a result, a gap-filling solution named PhishRepo was proposed as an online data repository that collects, verifies, disseminates, and archives phishing data. It includes innovative design aspects such as automated submission, deduplication filtering, automated verification, crowdsourcing-based human interaction, an objection reporting window, and target attack prevention techniques. Moreover, the deduplication filter, used for the first time in phishing data collection, significantly impacted the collection process. It eliminated the duplicate data, which causes one of the most common machine learning errors known as data leakage. In addition, PhishRepo enables researchers to apply modern machine learning techniques effectively and supports them by eliminating phishing data hassle. Therefore, more thoughtful use of PhishRepo will lead to effective anti-phishing solutions in the future, minimising the social engineering crime called phishing.

Cite

CITATION STYLE

APA

Ariyadasa, S., Fernando, S., & Fernando, S. (2022). PhishRepo: A Seamless Collection of Phishing Data to Fill a Research Gap in the Phishing Domain. International Journal of Advanced Computer Science and Applications, 13(5), 850–865. https://doi.org/10.14569/IJACSA.2022.0130597

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free