A Framework for Web Archiving and Guaranteed Retrieval

A. Devendran; K. Arunkumar

Conference Proceedings

A Framework for Web Archiving and Guaranteed Retrieval

Advances in Intelligent Systems and Computing (2020) 1016 205-215

DOI: 10.1007/978-981-13-9364-8_16

1Citations

6Readers

Get full text

Abstract

As of today, ‘web.archive.org’ has more than 338 billion web pages archived. How many of those pages are 100% retrieval. How many of the pages were left out or ignored just because the page doesn’t have some compatibility issue? How many of them were vernacular language and encoded in different formats (before UNICODE is standardized)? If we are talking about the content-type text. Consider other mime types which were encoded and decoded with different algorithms. The fundamental reason for this lies with the fundamental representation of digital data. We all know a sequence of 0 s and 1 s doesn’t make proper sense unless it is decoded properly. At the time of archiving, the browsers which could have rendered properly might have gone obsolete or upgraded way beyond to recognize old formats or the browser platforms could have been upgraded to recognize old formats. We studied various data preservation, web archiving related works and proposed a new framework that could store the exact client browser details (user-agent) in the WARC record and use it to load corresponding browser @ client side and render the archived content.

Author supplied keywords

Cite

CITATION STYLE

APA

Devendran, A., & Arunkumar, K. (2020). A Framework for Web Archiving and Guaranteed Retrieval. In Advances in Intelligent Systems and Computing (Vol. 1016, pp. 205–215). Springer. https://doi.org/10.1007/978-981-13-9364-8_16

A Framework for Web Archiving and Guaranteed Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions