Four datasets derived from an archive of personal homepages (1995–2009)

0Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research.

Cite

CITATION STYLE

APA

Rife, S. C. (2017). Four datasets derived from an archive of personal homepages (1995–2009). Data, 2(2). https://doi.org/10.3390/data2020019

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free