Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages

Sushma Nagesh Bannur; Lawrence K. Saul; Stefan Savage

Conference Proceedings

Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages

Proceedings of the ACM Conference on Computer and Communications Security (2011) 1-9

DOI: 10.1145/2046684.2046686

23Citations

56Readers

Get full text

Abstract

The physical world is rife with cues that allow us to distinguish between safe and unsafe situations. By contrast, the Internet offers a much more ambiguous environment; hence many users are unable to distinguish a scam from a legitimate Web page. To help address this problem, we explore how to train classifiers that can automatically identify malicious Web pages based on clues from their textual content, structural tags, page links, visual appearance, and URLs. Using a contemporary labeled data feed from a large Web mail provider, we extract such features and demonstrate how they can be used to improve classification accuracy over previous, more constrained approaches. In particular, by analyzing the full content of individual Web pages, we more than halve the error rate obtained by a comparably trained classifier that only extracts features from URLs. By training classifiers on different sets of features, we are further able to assess the strength of clues provided by these different sources of information. © 2011 ACM.

Author supplied keywords

Cite

CITATION STYLE

APA

Bannur, S. N., Saul, L. K., & Savage, S. (2011). Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages. In Proceedings of the ACM Conference on Computer and Communications Security (pp. 1–9). https://doi.org/10.1145/2046684.2046686

Judging a site by its content: Learning the textual, structural, and visual features of malicious web pages

Abstract

Author supplied keywords

Cite

Register to see more suggestions