Finding credible pages is a challenging problem on the Web. Our key observation in this paper is that credible pages usually link to credible content-related pages, which is different from a normal page usually links to normal pages in spam page detection. We propose a novel method to find credible pages based on the trust web graph we define. This method first measures the content correlation between pages connected by hyperlinks, then it combines web link structure with content correlation value of pages to build a trust web graph. At last, credible pages are found successfully by using trust relation of vertices on the trust web graph. We construct a real-world data set by crawling millions of pages on the web and run a set of experiments on this data set. Experiment results show that the accuracy of this method is near 80% and the efficiency is higher. © 2012 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Wang, T., Zhu, Q., Wang, S., & Liang, J. (2012). FindCredPg: A novel method to find credible pages based on trust web graph. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7235 LNCS, pp. 282–293). https://doi.org/10.1007/978-3-642-29253-8_24
Mendeley helps you to discover research relevant for your work.