The research of spam web page detection method based on web page differentiation and concrete cluster centers

5Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To improve the PageRank algorithm’s disadvantage of assigning link weights evenly and ignoring the authority of web page, we propose an improved PageRank algorithm based on web page differentiation (DPR) which evaluate pages authority according it’s links’ numbers and assign corresponding weights according to its authoritativeness when assigning PR values. To improve the cluster’s stability and accuracy of the K-Means algorithm, we combine DPR with K-Means, design a differentiation page-based K-Means (DPK-Means) algorithm. This algorithm will sort the pages according to the PR value obtained by the DPR algorithm and then concrete cluster centers according to the current sorting result. Experiments show that in spam detection, the DPR is superior to PageRank in terms of pages numbers, recall rate, accuracy, and F-Measure value and DPK-Means has better performance than the K-Means.

Cite

CITATION STYLE

APA

Yu, M., Zhang, J., Wang, J., Gao, J., Xu, T., & Yu, R. (2018). The research of spam web page detection method based on web page differentiation and concrete cluster centers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10874 LNCS, pp. 820–826). Springer Verlag. https://doi.org/10.1007/978-3-319-94268-1_73

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free