Analyzing information retrieval methods to recover broken web links

Juan Martinez-Romo; Lourdes Araujo

Conference Proceedings

Analyzing information retrieval methods to recover broken web links

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 5993 LNCS 26-37

DOI: 10.1007/978-3-642-12275-0_6

7Citations

16Readers

Get full text

Abstract

In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the page containing the link, and the cache page in some digital library. The selected information is processed and submitted to a search engine. We have compared different information retrieval methods for both, the selection of terms used to construct the queries submitted to the search engine, and the ranking of the candidate pages that it provides, in order to help the user to find the best replacement. In particular, we have used term frequencies, and a language model approach for the selection of terms; and cooccurrence measures and a language model approach for ranking the final results. To test the different methods, we have also defined a methodology which does not require the user judgments, what increases the objectivity of the results. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Martinez-Romo, J., & Araujo, L. (2010). Analyzing information retrieval methods to recover broken web links. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5993 LNCS, pp. 26–37). Springer Verlag. https://doi.org/10.1007/978-3-642-12275-0_6

Analyzing information retrieval methods to recover broken web links

Abstract

Author supplied keywords

Cite

Register to see more suggestions