News page discovery policy for instant crawlers

Yong Wang; Yiqun Liu; Min Zhang; Shaoping Ma

Conference Proceedings

News page discovery policy for instant crawlers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4993 LNCS 520-525

DOI: 10.1007/978-3-540-68636-1_58

0Citations

8Readers

Get full text

Abstract

Many news pages which are of high freshness requirements are published on the internet every day. They should be downloaded immediately by instant crawlers. Otherwise, they will become outdated soon. In the past, instant crawlers only downloaded pages from a manually generated news website list. Bandwidth is wasted in downloading non-news pages because news websites do not publish news pages exclusively. In this paper, a novel approach is proposed to discover news pages. This approach includes seed selection and news URL prediction based on user behavior analysis. Empirical studies in a user access log for two months show that our approach outperforms the traditional approach in both precision and recall. © 2008 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, Y., Liu, Y., Zhang, M., & Ma, S. (2008). News page discovery policy for instant crawlers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4993 LNCS, pp. 520–525). https://doi.org/10.1007/978-3-540-68636-1_58

News page discovery policy for instant crawlers

Abstract

Author supplied keywords

Cite

Register to see more suggestions