Robots Still Outnumber Humans in Web Archives, But Less Than Before

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To identify robots and humans and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012 and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the two years of IA access logs (2012 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 is greater than in IA 2019 (21% more in requests and 18% more in sessions). Robots account for 98% of requests (97% of sessions) in Arquivo.pt (2019). We found that the robots are almost entirely limited to “Dip” and “Skim” access patterns in IA 2012, but exhibit all the patterns and their combinations in IA 2019. Both humans and robots show a preference for web pages archived in the near past.

Cite

CITATION STYLE

APA

Jayanetti, H. R., Garg, K., Alam, S., Nelson, M. L., & Weigle, M. C. (2022). Robots Still Outnumber Humans in Web Archives, But Less Than Before. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13541 LNCS, pp. 245–259). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16802-4_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free