Combining textual content and hyperlinks in web spam detection

F. Javier Ortega; Craig Macdonald; José A. Troyano; Fermín L. Cruz; Fernando Enríquez

Conference Proceedings

Combining textual content and hyperlinks in web spam detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6716 LNCS 266-269

DOI: 10.1007/978-3-642-22327-3_35

1Citations

5Readers

Get full text

Abstract

In this work, we tackle the problem of spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a priori estimation of the spam likelihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) a web page is, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Ortega, F. J., Macdonald, C., Troyano, J. A., Cruz, F. L., & Enríquez, F. (2011). Combining textual content and hyperlinks in web spam detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6716 LNCS, pp. 266–269). https://doi.org/10.1007/978-3-642-22327-3_35

Combining textual content and hyperlinks in web spam detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions