Abstract
Existing phishing detection techniques mainly rely on blacklists or content-based analysis, which are not only evadable, but also exhibit considerable detection delays as they are reactive in nature. We observe through our deep dive analysis that artifacts of phishing are manifested in various sources of intelligence related to a domain even before its contents are online. In particular, we study various novel patterns and characteristics computed from viable sources of data including Certificate Transparency Logs, and passive DNS records. To compare benign and phishing domains, we construct thoroughly-verified realistic benign and phishing datasets. Our analysis shows clear differences between benign and phishing domains that can pave the way for content-agnostic approaches to predict phishing domains even before the contents of these webpages are up and running. To demonstrate the usefulness of our analysis, we train a classifier with distinctive features, and we show that we can (1) perform content-agnostic predictions with a very low FPR of 0.3%, and high precision (98%) and recall (90%), and (2) predict phishing domains days before they are discovered by state-of-the-art content-based tools such as VirusTotal.
Author supplied keywords
Cite
CITATION STYLE
Alsabah, M., Nabeel, M., Boshmaf, Y., & Choo, E. (2022). Content-Agnostic Detection of Phishing Domains using Certificate Transparency and Passive DNS. In ACM International Conference Proceeding Series (pp. 446–459). Association for Computing Machinery. https://doi.org/10.1145/3545948.3545958
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.