To improve the PageRank algorithm’s disadvantage of assigning link weights evenly and ignoring the authority of web page, we propose an improved PageRank algorithm based on web page differentiation (DPR) which evaluate pages authority according it’s links’ numbers and assign corresponding weights according to its authoritativeness when assigning PR values. To improve the cluster’s stability and accuracy of the K-Means algorithm, we combine DPR with K-Means, design a differentiation page-based K-Means (DPK-Means) algorithm. This algorithm will sort the pages according to the PR value obtained by the DPR algorithm and then concrete cluster centers according to the current sorting result. Experiments show that in spam detection, the DPR is superior to PageRank in terms of pages numbers, recall rate, accuracy, and F-Measure value and DPK-Means has better performance than the K-Means.
CITATION STYLE
Yu, M., Zhang, J., Wang, J., Gao, J., Xu, T., & Yu, R. (2018). The research of spam web page detection method based on web page differentiation and concrete cluster centers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10874 LNCS, pp. 820–826). Springer Verlag. https://doi.org/10.1007/978-3-319-94268-1_73
Mendeley helps you to discover research relevant for your work.